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Abstract 



We study the scenario of graph-based clustering algorithms such as spectral clustering. 
<^ Given a set of data points, one first has to construct a graph on the data points and then 

^-J apply a graph clustering algorithm to find a suitable partition of the graph. Our main question 

is if and how the construction of the graph (choice of the graph, choice of parameters, choice 
of weights) influences the outcome of the final clustering result. To this end we study the 
convergence of cluster quality measures such as the normalized cut or the Cheeger cut on 
I various kinds of random geometric graphs as the sample size tends to infinity. It turns out 

that the limit values of the same objective function are systematically different on different 
types of graphs. This implies that clustering results systematically depend on the graph 
£■ — . and can be very different for different types of graph. We provide examples to illustrate the 

C"^**) implications on spectral clustering. 

<N 
(N 

O 1 Introduction 

Nowadays it is very popular to represent and analyze statistical data using random graph or 
. network models. The vertices in such a graph correspond to data points, whereas edges in the 

graph indicate that the adjacent vertices are "similar" or "related" to each other. In this paper 
we consider the problem of data clustering in a random geometric graph setting. We are given 
a sample of points drawn from some underlying probability distribution on a metric space. The 
goal is to cluster the sample points into "meaningful groups" . A standard procedure is to first 
transform the data to a neighborhood graph, for example a fc-nearest neighbor graph. In a second 
step, the cluster structure is then extracted from the graph: clusters correspond to regions in the 
graph that are tightly connected within themselves and only sparsely connected to other clusters. 

There already exist a couple of papers that study statistical properties of this procedure in a 
particular setting: when the true underlying clusters are defined to be the connected components 
of a density level set in the underlying space. In his setting, a test for detecting cluster structure and 



outliers is proposed inlBrito et al. (19971. In Biau et al. (2007) the authors build a neighborhood 



graph in such a way that its connected components converge to the underlying true clusters in the 



data. Maier et al. ( 2009a ) compare the properties of different random graph models for identifying 



clusters of the density level sets. 
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While the definition of clusters as connected components of level sets is appealing from a theoretical 
point of view, the corresponding algorithms are often too simplistic and only moderately successful 
in practice. From a practical point of view, clustering methods based on graph partitioning algo- 
rithms are more robust. Clusters do not have to be perfectly disconnected in the graph, but are 
allowed to have a small number of connecting edges between them. Graph partitioning methods 
are widely used in practice. The most prominent algorithm in this class is spectral clustering, 
which optimizes the normalized cut (NCut) objective function (see below for exact definitions, 



and von Luxburg (2007) for a tutorial on spectral clustering). It is already known under what 



circumstances spectral clustering is statistically consistent (von Luxburg et al. 20081. However, 
there is one important open question. When applying graph-based methods to given sets of data 
points, one obviously has to build a graph first, and there are several important choices to be 
made: the type of the graph (for example, fc-nearest neighbor graph, the r-neighborhood graph or 
a Gaussian similarity graph), the connectivity parameter (k or r or a, respectively) and the weights 
of the graph. Making such choices is not so difficult in the domain of supervised learning, where 
parameters can be set using cross-validation. However, it poses a serious problem in unsupervised 
learning. While different researchers use different heuristics and their "gut feeling" to set these 
parameters, neither systematic empirical studies have been conducted (for example, how sensitive 
the results are to the choice of graph parameters), nor do theoretical results exist which lead to 
well-justified heuristics. 

In this paper we study the question if and how the results of graph-based clustering algorithms 
are affected by the graph type and the parameters that are chosen for the construction of the 
neighborhood graph. We focus on the case where the best clustering is defined as the partition 
that minimizes the normalized cut (Ncut) or the Cheeger cut. 

Our theoretical setup is as follows. In a first step we ignore the problem of actually finding 
the optimal partition. Instead we fix some partition of the underlying space and consider it as 
the "true" partition. For any finite set of points drawn from the underlying space we consider the 
clustering of the points that is induced by this underlying partition. Then we study the convergence 
of the NCut value of this clustering as the sample size tends to infinity. We investigate this question 
on different kinds of neighborhood graphs. Our first main result is that depending on the type of 
graph, the clustering quality measure converges to different limit values. For example, depending 
on whether we use the kNN graph or the r-graph, the limit functional integrates over different 
powers of the density. From a statistical point of view, this is very surprising because in many 
other respects, the kNN graph and the r-graph behave very similar to each other. Just consider 
the related problem of density estimation. Here, both the fc-nearest neighbor density estimate 
and the estimate based on the degrees in the r-graph converge to the same limit, namely the true 
underlying density. So it is far from obvious that the NCut values would converge to different 
limits. 

In a second step we then relate these results to the setting where we optimize over all partitions 
to find the one that minimizes the NCut. We can show that the results from the first part can 
lead to the effect that the minimizer of NCut on the kNN graph is different from the minimizcr 
of NCut on the r-graph or on the complete graph with Gaussian weights. This effect can also be 
studied in practical examples. First, we give examples of well-clustered distributions (mixtures 
of Gaussians) where the optimal limit cut on the kNN graph is different from the one on the 
r-neighborhood graph. The optimal limit cuts in these examples can be computed analytically. 
Next we can demonstrate that this effect can already been observed on finite samples from these 
distributions. Given a finite sample, running normalized spectral clustering to optimize Ncut leads 
to systematically different results on the kNN graph than on the r-graph. This shows that our 
results are not only of theoretical interest, but that they are highly relevant in practice. 

In the following section we formally define the graph clustering quality measures and the neighbor- 
hood graph types we consider in this paper. Furthermore, we introduce the notation and technical 
assumptions for the rest of the paper. In Section[3]we present our main results on the convergence 
of NCut and the CheegerCut on different graphs. In Section [3] we show that our findings are 
not only of theoretical interest, but that they also influence concrete algorithms such as spectral 
clustering in practice. All proofs are deferred to Sectio n [6] No te that a small part of the results of 



this paper has already been published in Maier et al. (2009b). 
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2 Definitions and assumptions 



Given a directed graph G = (V, E) with weights w : E — » R and a partition of the nodes V into 
(17, V \ U) wc define 

cut(C7, V \ U) = (w(u,v)+w(v,u)), 
ueu,vev\u 

and vol(J7) = J2 u eUvev w ( u i v )- If G 1S an undirected graph we replace the ordered pair (u,v) 
in the sums by the unordered pair {u, v}. Note that by doing so we count each edge twice in the 
undirected graph. This introduces a constant of two in the limits but it has the advantage that 
there is no need to distinguish in the formulation of our results between directed and undirected 
graphs. 

Intuitively, the cut measures how strong the connection between the different clusters in the clus- 
tering is, whereas the volume of a subset of the nodes measures the "weight" of the subset in terms 
of the edges that originate in it. An ideal clustering would have a low cut and balanced clusters, 
that is clusters with similar volume. The graph clustering quality measures that we use in this 
paper, the normalized cut and the Cheeger cut, formalize this trade-off in slightly different ways: 
The normalized cut is defined by 

NCum V\U) = cum V \ U) (-^ + , (1) 

whereas the Cheeger cut is defined by 

ChccgcrCut([/, V \ U) = . S CU ^; VX Z\ mr ( 2 ) 

mm{vol((7), vol(V \ U)} 

These definitions are useful for general weighted graphs and general partitions. As was said in 
the beginning we want to study the values of NCut and CheegerCut on neighborhood graphs on 
sample points in Euclidean space and for partitions of the nodes that are induced by a hyperplane 
S in R d . The two halfspaces belonging to S are denoted by H + and H~ . Having a neighborhood 
graph on the sample points {x\, . . . ,x n }, the partition of the nodes induced by S is {{x\, . . . ,x„}n 
H + , {x\, . . . , x n } H H~). In the rest of this paper for a given neighborhood graph G n we set 
cut„ = cut({xi, . . . , x n } n H + , {x\, . . . ,x n } n H~). Similarly, for H = H + or H = H~ we set 
vol„(i7) = vol({xi, . . . ,x n } n H + ). Accordingly we define NCut„ and CheegerCut„. 

In the following we introduce the different types of neighborhood graphs and weighting schemes 
that are considered in this paper. The graph types are: 

• The k-nearest neighbor dtNNj graphs, where the idea is to connect each point to its k nearest 
neighbors. However, this yields a directed graph, since the /e-nearest neighbor relationship 
is not symmetric. If we want to construct an undirected kNN graph we can choose between 
the mutual kNN graph, where there is an edge between two points if both points are among 
the k nearest neighbors of the other one, and the symmetric kNN graph, where there is an 
edge between two points if only one point is among the k nearest neighbors of the other one. 
In our proofs for the limit expressions it will become clear that these do not differ between 
the different types of kNN graphs. Therefore, we do not distinguish between them in the 
statement of the theorems, but rather speak of "the kNN graph" . 

• The r -neighborhood graph, where a radius r is fixed and two points are connected if their 
distance does not exceed the threshold radius r. Note that due to the symmetry of the 
distance we do not have to distinguish between directed and undirected graphs. 

• The complete weighted graph, where there is an edge between each pair of distinct nodes (but 
no loops). Of course, in general we would not consider this graph a neighborhood graph. 
However, if the weight function is chosen in such a way that the weights of edges between 
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nearby nodes are high and the weights between points far away from each other are almost 
negligible, then the behavior of this graph should be similar to that of a neighborhood graph. 
One such weight function is the Gaussian weight function, which we introduce below. 

The weights that are used on neighborhood graphs usually depend on the distance of the end 
nodes of the edge and are non-increasing. That is, the weight w(xi,Xj) of an edge (xi,Xj) is given 
by w(xi,xj) — f(dist(xi,Xj)) with a non-increasing weight function /. The weight functions we 
consider here are the unit weight function f = 1, which results in the unweighted graph, and the 
Gaussian weight function 



with a parameter a > defining the bandwidth. 

Of course, not every weighting scheme is suitable for every graph type. For example, as mentioned 
above, we would hardly consider the complete graph with unit weights a neighborhood graph. 
Therefore, we only consider the Gaussian weight function for this graph. On the other hand, for 
the kNN graph and the r-ncighborhood graph with Gaussian weights there are two "mechanisms" 
that reduce the influence of far-away nodes: first the fact that far-away nodes are not connected 
to each other by an edge and second the decay of the weight function. In fact, it turns out that 
the limit expressions we study depend on the interplay between these two mechanisms. Clearly, 
the decay of the weight function is governed by the parameter a. For the r-neighborhood graph 
the radius r limits the length of the edges. Asymptotically, given sequences (er n ) nG N and (r„)„ £ N 
of bandwidths and radii we distinguish between the following two cases: 

• the bandwidth a n is dominated by the radius r„, that is cr n /r n — > for n — > oo, 

• the radius r n is dominated by the bandwidth a n , that is r n /a n — > for n — > oo. 

For the kNN graph we cannot give a radius up to which points are connected by an edge, since this 
radius for each point is a random variable that depends on the positions of all the sample points. 
However, it is possible to show that for a point in a region of constant density p the /c„-nearest 
neighbor radius is concentrated around y/k n /((n — l)n d p), where r\ d denotes the volume of the 
unit ball in Euclidean space R d . That is, the kNN radius decays to zero with the rate y/k n /n. In 
the following it is convenient to set for the kNN graph r n — \J k n /n, noting that this is not the 
/c-nearest neighbor radius of any point but only its decay rate. Using this "radius" we distinguish 
between the same two cases of the ratio of r n and a n as for the r-neighborhood graph. 

For the sequences (r„)„ eN and (cr„)„ e N we always assume r n — > 0, a n — > and nr n — > oo, na n — > oo 
for n — > oo. Furthermore, for the parameter sequence (k n ) ne ^ of the kNN graph we always assume 
k n /n — > 0, which corresponds to r n — > 0, and k n /\ogn — > oo. 

In the rest of this paper we denote by C d the Lebesgue measure in R d . Furthermore, let B(x,r) 
denote the closed ball of radius r around x and r\d = jO d (B(0, 1)), where we set rj = 1. 

We make the following general assumptions in the whole paper: 

• The data points drawn independently from some density p onR d . The measure 
on R d that is induced by p is denoted by \x; that means, for a measurable set A C R d we set 
v(A) = f A p(x) dx. 

• The density p is bounded from below and above, that is < p,„i n < p(x) < p max . In particular, 
it has compact support C. 

• In the interior of C , the density p is twice differentiable and \\Vp(x)\\ < p' max for a p' max € M 
and all x in the interior of C. 

• The cut hyperplane S splits the space R d into two halfspaces H + and H~ (both including the 
hyperplane S) with positive probability masses, that is /i(H + ) > 0, fi(H~) > 0. The normal 
of S pointing towards H + is denoted by ns- 
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• If d > 2 the boundary dC is a compact, smooth (d — 1)- dimensional surface with minimal 
curvature radius K > 0, that is the absolute values of the principal curvatures are bounded 
by 1/k. We denote by n x the normal to the surface dC at the point x 6 dC . Furthermore, 
we can find constants 7 > and r 7 > such that for all r < r 7 we have Cd{B(x, r) l~l C) > 
^Ld{B(x, r)) for all x € C . 

• If d > 2 we can find an angle a € (0, 7r/2) such that \ (ns, n x )\ < cos a for all x € S n <9C. // 

= 1 we assume that (the point) S is in the interior of C. 

The assumptions on the boundary dC are necessary in order to bound the influence of points that 
are close to the boundary. The problem with these points is that the density is not approximately 
uniform inside small balls around them. Therefore, we cannot find a good estimate of their kNN 
radius and on their contribution to the cut and the volume. Under the assumptions above we can 
neglect these points. 



3 Main results: Limits of the quality measures NCut and 
CheegerCut 

As we can see in Equations ([!]) and |2]) the definitions of NCut and CheegerCut rely on the cut 
and the volume. Therefore, in order to study the convergence of NCut and CheegerCut it seems 
reasonable to study the convergence of the cut and the volume first. In Scction[6]the Corollaries [l][3] 
and the Corollaries |4]|6] state the convergence of the cut and the volume on the kNN graphs. The 
Corollaries [7]fl0| state the convergence of the cut on the r-graph and the complete weighted graph, 
whereas the Corollaries TTfT4 state the convergence of the volume on the same graphs. 

These corollaries show that there are scaling sequences (s™ t )„ S N and (s™ 1 ),^^ that depend on n, 
r n and the graph type such that, under certain conditions, almost surely 

(s™')" 1 cut„ ->■ CutLim and (s™ 1 )^ 1 vol n (H) ->■ VolLim(H) 



for n — > 00, where CutLim £ M>o and VolLim(H + ), VolLim{H ) £ K>o are constants depending 
only on the density p and the hyperplane S. 

Having defined these limits we define, analogously to the definitions in Equations and ([2]), the 
limits of NCut and CheegerCut as 

CutLim CutLim 
NCutLim ^ VolUm{H+) + VolUm(H-) (3) 

and 

CheegerCutUm = nlhl{VolLm {H+h y dLlm {H - )} ■ (4) 

In our following main theorems we show the conditions under which we have for n — > 00 almost 
sure convergence of 

„vol vol 



NCut„ — > NCutLim and CheegerCut — > CheegerCutLim. 

nLlil qCLIT 



Furthermore, for the unweighted r-graph and kNN-graph and for the complete weighted graph 
with Gaussian weights we state the optimal convergence rates, where "optimal" means the best 
trade-off between our bounds for different quantities derived in Section [6j Note that we will not 



prove the following theorems here. Rather the proof of Theorem fT] can be found in Section [6.2.4 
whereas the proofs of Theorems [2] and [3] can be found in Section |6.3.3 
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The cut in the kNN-graph and the r-graph 


Weighting 


_cut 


CutLim kNN-graph 


CutLim r-graph 


unweighted 




(d+iW d +1/d JsP { J 


Iff Js^) d- 


weighted r n /a n — >• oo 


n 2 a n 






weighted r n /a n — > 


d 2 d+1 
u n 11 1 n 


ivd-in* 1 1/d r i-i/d ( \ d 


(d+l){2ir) d / 2 JsP ( S ) ds 



The cut in the complete weighted graph 


Weighting 


cut 


CutLim in complete weighted graph 


weighted 


n 2 a„ 


^rJ'sP^) ^ 



The volume in the kNN-graph and the r-graph 


Weighting 


vol 


VolLim(H) kNN-graph 


VolLim(H) r-graph 


unweighted 


n 2 r d 


J H p(x) dx 


Vdf H P 2 (x) dx 


weighted, r n /a n — > oo 


n 2 


J H p 2 (x) dx 


J H p 2 (x) dx 


weighted, r n /a n 


<J- d n 2 r d n 


{2^/1 J H P( X ) dx 


J H P 2 (x) dx 




The volume in the complete weighted graph 


Weighting 


vol 


VolLim in complete weighted graph 


weighted 


ri l 


J H p 2 (x) dx 



Table 1 : The scaling sequences and limit expression for the cut and the volume in all the considered 
graph types. In the limit expression for the cut the integral denotes the (d— l)-dimensional surface 
integral along the hyperplane S, whereas in the limit expressions for the volume the integral denotes 
the Lebesgue integral over the halfspace H = H + or H = H~ . 



Theorem 1 (NCut and CheegerCut on the kNN-graph) For a sequence (k n ) n en with k n /n — > 
for n —¥ oo let G n be the k n -nearest neighbor graph on the sample Xi , . . . , x n . Set XCut = NCut or 
XCut = CheegerCut and let XCutLim denote the corresponding limit as defined in Equations ^ 
and @. Set 



„vol 



XCut n —XCutLim 



Let G n be the unweighted kNN graph. If k n /^/n logrt — > oo in the case d = 1 and fc„/logn — > 
oo in the case d > 2 we have A„ — > for n — > oo almost surely. The optimal convergence 
rate is achieved for k n = fco \J n3 log n m the case d~l and k n = kon 2 ^ d+2 ^ (\ogn) d ^ d+2 ^ in 
the case d > 2. For this choice of k n we have A n — 0( d+ ^/log n/n) in the case d = 1 and 
A„ = 0( d+ t/\ogn/n) for d>2. 



Let G n be the kNN-graph with Gaussian weights and suppose r n > cr" for an a £ (0, 1). Then 
we have almost sure convergence of A n — > for n — > oo if k n / log n — > oo and na 
oo. 



d+l 



/logn 



Let G n be the kNN-graph with Gaussian weights and r n ja n — > 0. Then we have almost sure 
convergence of A„ — > for n — > oo ifk n j\jn logn — > oo in the case d = 1 and k n /logn — > oo 
in the case d > 2. 



Theorem 2 (NCut and CheegerCut on the r-graph) For a sequence (r 



n)n£N 



C 



>o with r n 



for n — >• oo let G n be the r n -neighborhood graph on the sample X\, . . . ,x n . Set XCut = NCut or 
XCut = CheegerCut and let XCutLim denote the corresponding limit as defined in Equations ^ 
and Q. Set 



XCut n —XCutLim 



G 



• Let G n be unweighted. Then A„ — > almost surely for n — > oo if nrf^ 1 / log n — » oo. The 
optimal convergence rate is achieved for r n — ro d+ ^/log n/n for a suitable constant ro > 0. 
For this choice of r n we have A„ = 0{ d+ yJ\ogn/n). 

• Let G n be weighted with Gaussian weights with bandwidth o~ n —¥ and r n /a n — > oo for 
n — > oo. Then A„ — > almost surely for n oo if na d+1 / log n — > oo. 

• Le£ G„ be weighted with Gaussian weights with bandwidth a n — > arid r n /a n — > /or n — )• oo. 
TTien A„ — > almost surely for n — > oo if nr d+1 / \ogn — > oo. 



The following theorem presents the limit results for NCut and CheegerCut on the complete weighted 
graph. One result that we need in the proof of this theorem is Corollary [8] on the convergence of 
the cut. Note that in Narayanan et al. (2007) a similar cut convergence problem is studied for 
the case of the complete weighted graph, and the scaling sequence and the limit differ from ours. 
However, the reason is that in that paper the weighted cut is considered, which can be written as 
f'L noTm f, where L n orm denotes the normalized graph Laplacian matrix and / is an n-dimensional 
vector with /j = 1 if Xi is in one cluster and fi = if Xi is in the other cluster. On the other hand, 
the standard cut, which we consider in this paper, can be written (up to a constant) as f'L unnoim f, 
where £ U nnorm denotes the unnormalized graph Laplacian matrix. (For the definitions of the graph 
Laplacian matrices and their relationship to the cut we refer the reader to von Luxburg (2007).) 
Therefore, the two results do not contradict each other. 



Theorem 3 (NCut and CheegerCut on the complete weighted graph) Let G n be the com- 
plete weighted graph with Gaussian weights and bandwidth a n on the sample points X\, . . . , x n . Set 
XCut = NCut or XCut = CheegerCut and let XCutLim denote the corresponding limit as defined 
in Equations ([3| and Q. Set 



A„ = 



XCut„ —XCutLim 



Under the conditions a n — > and na^ +1 / \ogn — > oo we have almost surely A„ — > for n — » oo. 
The optimal convergence rate is achieved setting a n = o~o d+ ^/\og n/n with a suitable o~q > 0. For 
this choice of a n the convergence rate is in 0(((logri)/ri) Q /^ +3 ') for any a € (0, 1). 



Let us decrypt these results and for simplicity focus on the cut value. When we compare the limits 
of the cut (cf. Table [I]) it is striking that, depending on the graph type and the weighting scheme, 
there are two substantially different limits: the limit J s p 2 (s) ds for the unweighted r-neighborhood 
graph, and the limit J s p 1 ~ 1 / d (s) ds for the unweighted fc-nearest neighbor graph. 

The limit of the cut for the complete weighted graph with Gaussian weights is the same as the 
limit for the unweighted r-neighborhood graph. There is a simple reason for that: On both graph 
types the weight of an edge only depends on the distance between its end points, no matter where 
the points are. This is in contrast to the kNN-graph, where the radius up to which a point is 
connected strongly depends on its location: If a point is in a region of high density there will be 
many other points close by, which means that the radius is small. On the other hand, this radius 
is large for points in low-density regions. Furthermore, the Gaussian weights decline very rapidly 
with the distance, depending on the parameter a. That is, a plays a similar role as the radius r 
for the r-neighborhood graph. 

The two types of r-neighborhood graphs with Gaussian weights have the same limit as the un- 
weighted r-neighborhood graph and the complete weighted graph with Gaussian weights. When 
we compare the scaling sequences s„ it turns out that in the case r„/er„ — > oo this sequence is the 
same as for the complete weighted graph, whereas in the case r n /a n —> we have s™* = n 2 r^ +1 /a d , 
which is the same sequence as for the unweighted r-graph corrected by a factor of cr~ d . In fact, 
these effects are easy to explain: If r n /a n — > oo then the edges which we have to remove from the 
complete weighted graph in order to obtain the r„-neighborhood graph have a very small weight 
and their contribution to the value of the cut can be neglected. Therefore this graph behaves like 
the complete weighted graph with Gaussian weights. On the other hand, if r n /a n — > then all the 
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Density example 1 Density example 2 




Figure 1: Densities in the examples. In the two-dimensional case, we plot the informative dimension 
(marginal over the other dimensions) only. The dashed blue vertical line depicts the optimal limit 
cut of the r-graph, the solid red vertical line the optimal limit cut of the kNN graph. 



edges that remain in the r„-neighborhood graph have approximately the same weight, namely the 
maximum of the Gaussian weight function, which is linear in a~ d . 

Similar effects can be observed for the /c-nearest neighbor graphs. The limits of the unweighted 
graph and the graph with Gaussian weight and r n /a n — > are identical (up to constants) and the 
scaling sequence has to correct for the maximum of the Gaussian weight function. However, the 
limit for the kNN-graph with Gaussian weights and r n /a n — > oo is different: In fact, we have the 
same limit expression as for the complete weighted graph with Gaussian weights. The reason for 
this is the following: Since r n is large compared to a n at some point all the fc-nearest neighbor 
radii of the sample points are very large. Therefore, all the edges that are in the complete weighted 
graph but not in the kNN graph have very low weights and thus the limit of this graph behaves 
like the limit of the complete weighted graph with Gaussian weights. 

Finally, we would like to discuss the difference between the two limit expressions, where as examples 
for the graphs we use only the unweighted r-neighborhood graph and the unweighted kNN-graph. 
Of course, the results can be carried over to the other graph types. For the cut we have the limits 
/ s p 1_1/d (s) ds and f s p 2 {s) ds. In dimension I the difference between these expressions is most 
pronounced: The limit for the kNN graph does not depend on the density p at all, whereas in the 
limit for the r-graph the exponent of p is 2, independent of the dimension. Generally, the limit for 
the r-graph seems to be more sensitive to the absolute value of the density. This can also be seen 
for the volume: The limit expression for the kNN graph is J H p(x) dx, which does not depend on 
the absolute value of the density at all, but only on the probability mass in the halfspace H . This 
is different for the unweighted r-neighborhood graph with the limit expression j H p 2 (x) dx. 



4 Examples where different limits of Ncut lead to different 
optimal cuts 

In Theorems [T][3] we have proved that the limit expressions for NCut and CheegerCut are different 
for different kinds of neighborhood graphs. In fact, apart from constants there are two limit 
expressions: that of the unweighted kNN-graph, where the exponent of the density p in the limit 
integral for the cut is 1 — 1/d and for the volume is 1, and that of the unweighted r-neighborhood 
graph, where the exponent in the limit of the cut is 2 and in the limit of the vol is 1. Therefore, 
we consider here only the unweighted kNN-graph and the unweighted r-neighborhood graph. 

In this section we show that the difference between the limit expressions is more than a mathemat- 
ical subtlety without practical relevance: If we select an optimal cut based on the limit criterion 
for the kNN graph we can obtain a different result than if we use the limit criterion based on the 
r-neighborhood graph. 

Consider Gaussian mixture distributions in one (Example 1) and in two dimensions (Example 2) 
of the form J2i=i a iN([^i, 0, . . . , 0], <z,-T) which are set to zero where they are below a threshold 8 
and properly rescaled. The specific parameters in one and two dimensions are 
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Figure 2: Results of spectral clustering in two dimensions, for the unweighted r-graph (left) and 
the unweighted kNN graph (right) 
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0.4 


0.1 


0.1 


0.66 


0.17 


0.17 


0.1 


2 


-1.1 





1.3 


0.2 


0.4 


0.1 


0.4 


0.55 


0.05 


0.01 



Plots of the densities of Example 1 and 2 can be seen in Figure [TJ We first investigate the theoretic 
limit cut values, for hyperplanes which cut perpendicular to the first dimension (which is the 
"informative" dimension of the data). For the chosen densities, the limit NCut expressions from 
Theorems [1] and [2] can be computed analytically and optimized over the chosen hyperplanes. The 
solid red line in Figure [l] indicates the position of the minimal value for the kNN-graph case, 
whereas the dashed blue line indicates the the position of the minimal value for the r-graph case. 

Up to now we only compared the limits of different graphs with each other, but the question 
is, whether the effects of these limits can be observed even for finite sample sizes. In order to 



investigate this question we applied normalized spectral clustering (cf. von Luxburg (20071) to 
sample data sets of n — 2000 points from the mixture distribution above. We used the unweighted 
r-graph and the unweighted symmetric fc-nearest neighbor graph. We tried a range of reasonable 
values for the parameters k and r and the results we obtained were stable over a range of parameters. 
Here we present the results for the 30- (for d = 1) and the 150-nearest neighbor graphs (for d = 2) 
and the r-graphs with corresponding parameter r, that is r was set to be the mean 30- and 150- 
nearest neighbor radius. Different clusterings are compared using the minimal matching distance: 



^MAf(Clusti, Clust 2 ) 



• 1^ 
run — > 1 

w n z — ' 



Clusti(x i )^7r(Clust 2 (2; i )) 



i=l 



where the minimum is taken over all permutations ir of the labels. In the case of two clusters, 
this distance corresponds to the 0-1-loss as used in classification: a minimal matching distance of 
0.35, say, means that 35% of the data points lie in different clusters. In our spectral clustering 
experiment, we could observe that the clusterings obtained by spectral clustering are usually very 
close to the theoretically optimal hyperplane splits predicted by theory (the minimal matching 
distances to the optimal hyperplane splits were always in the order of 0.03 or smaller). As predicted 
by theory, the two types of graph give different cuts in the data. An illustration of this phenomenon 
for the case of dimension 2 can be found in Figure [2] To give a quantitative evaluation of this 
phenomenon, we computed the mean minimal matching distances between clusterings obtained by 
the same type of graph over the different samples (denoted cJ^nn and d r ), and the mean difference 
dkNN -r between the clusterings obtained by different graph types: 



Example 


rfkNN 


d r 


C^kNN -r 


1 dim 


0.0005 ± 0.0006 


0.0003 ±0.0004 


0.346 ± 0.063 


2 dim 


0.005 ± 0.0023 


0.001 ±0.001 


0.49 ±0.01 



We can see that for the same graph, the clustering results are very stable (differences in the 
order of 10~ 3 ) whereas the differences between the kNN graph and the r-neighborhood graph are 
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Density example 3 




Figure 3: The Example 3 with the sum of two Gaussians, that is two modes of the density. In 
the left figure the density with the optimal limit cut of the r-graph (dashed blue vertical line) and 
the optimal limit cut of the kNN graph (the solid red vertical line) is depicted. The two figures 
on the right show the histograms of the cluster boundary over 100 iterations for the unweighted 
r-neighborhood and kNN-graphs. 



substantial (0.35 and 0.49, respectively). This difference is exactly the one induced by assigning 
the middle mode of the density to different clusters, which is the effect predicted by theory. 

It is tempting to conjecture that in Example 1 and 2 the two different limit solutions and their 
impact on spectral clustering might arise due to the fact that the number of Gaussians and the 
number of clusters we are looking for do not coincide. Yet the following Example 3 shows that 
this is not the case: for a density in one dimension as above but with only two Gaussians with 
parameters 



Ml 


1^2 | O"! 


er 2 Oil 


a 2 


0.2 


0.4 || 0.05 


0.03 || 0.8 


0.2 || 0.1 



the same effects can be observed. The density is depicted in the left plot of Figure [3] 

In this example we draw a sample of 2000 points from this density and compute the spectral 
clustering of the points, once with the unweighted kNN-graph and once with the unweighted r- 
graph. In one dimension we can compute the place of the boundary between two clusters, that is 
the middle between the rightmost point of the left cluster and the leftmost point of the right cluster. 
We did this for 100 iterations and plotted histograms of the location of the cluster boundary. In 
the middle and the right plot of Figure [3] we see that these coincide with the optimal cut predicted 
by theory. 



5 Outlook 

In this paper we have investigated the influence of the graph construction on the graph-based 
clustering measures normalized cut and Cheeger cut. We have seen that depending on the type of 
graph and the weights, the clustering quality measures converge to different limit results. 

This means that ultimately, the question about the "best NCut" or "best Cheeger cut" clustering, 
given infinite amount of data, has different answers, depending on which underlying graph we 
use. This observation opens Pandora's box on clustering criteria: the "meaning" of a clustering 
criterion does not only depend on the exact definition of the criterion itself, but also on how the 
graph on the finite sample is constructed. This means that one graph clustering quality measure is 
not just "one well-defined criterion" on the underlying space, but it corresponds to a whole bunch 
of criteria, which differ depending on the underlying graph. More sloppy: A clustering quality 
measure applied to one neighborhood graph does something different in terms of partitions of the 
underlying space than the same quality measure applied to a different neighborhood graph. This 
shows that these criteria cannot be studied isolated from the graph they are applied to. 

From a theoretical side, there are several directions in which our work can be improved. In 
this paper we only consider partitions of Euclidean space that are defined by hyperplanes. This 
restriction is made in order to keep the proofs reasonably simple. However, we are confident that 
similar results could be proven for arbitrary smooth surfaces. 
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Another extension would be to obtain uniform convergence results. Here one has to take care that 
one uses a suitably restricted class of candidate surfaces S (note that uniform conv e rgenc e results 
over the set of all partitions of E d are impossible, cf. Bubeck and von Luxburg (2009)). This 
result would be especially useful, if there existed a practically applicable algorithm to compute the 
optimal surface out of the set of all candidate surfaces. 

For practice, it will be important to study how the different limit results influence clustering results. 
So far, we do not have much intuition about when the different limit expressions lead to different 
optimal solutions, and when these solutions will show up in practice. The examples we provided 
above already show that different graphs indeed can lead to systematically different clusterings in 
practice. Gaining more understanding of this effect will be an important direction of research if 
one wants to understand the nature of different graph clustering quality measures. 



6 Proofs 

In many of the proofs that are to follow in this section a lot of technique is involved in order to 
come to terms with problems that arise due to effects at the boundary of our support C and to the 
non-uniformity of the density p. However, if these technicalities are ignored, the basic ideas of the 
proofs are simple to explain and they are similar for the different types of neighborhood graphs. 
In Section |6.1| we discuss these ideas without the technical overhead and define some quantities 
that are necessary for the formulation of our results. 

In Section |6.2| we present the results for the fc-nearest neighbor graph and in Section |6.3| we present 
those for the r-graph and the complete weighted graph. Each of these sections consists of three 
parts: the first is devoted to the cut, the second is devoted to the volume, and in the third we 
proof the main theorem for the considered graphs using the results for the cut and the volume. 

The sections on the convergence of the cut and the volume always follow the same scheme: First, 
a proposition concerning the convergence of the cut or the volume for general monotonically de- 
creasing weight functions is given. Using this general proposition the results for the specific weight 
functions we consider in this paper follow as corollaries. 

Since the basic ideas of our proofs are the same for all the different graphs, it is not worth repeating 
the same steps for all the graphs. Therefore, we decided to give detailed proofs for the fc-nearest 
neighbor graph, which is the most difficult case. The r-neighborhood graph and the complete 
weighted graph can be treated together and we mainly discuss the differences to the proof for the 
kNN graph. 

The limits of the cut and the volume for general weight function are expressed in terms of certain 
integrals of the weight function over "caps" and "balls" , which are explained later. For a specific 



weight function these integrals have to be evaluated. This is done in the lemmas in Section 6.4 
Furthermore, this section contains a technical lemma that helps us to control boundary effects. 



6.1 Basic ideas 

In this section we present the ideas of our convergence proofs non-formally. We focus here on 
NCut, but all the ideas can easily be carried over to the Cheeger cut. 

First step: Decompose NCut„ into cut„ and vol ra 

Under our general assumptions there exist constants Ci, 02,03, which may depend on the limit 
values of the cut and the volume, such that for sufficiently large n 



vol 



cut r 



cut r . 



cut„ 



< Ci 



c cut 



vol n (H~ 
CutLim 



(-2 



CutLim 
~ VolLim (H+) 
vol»(g+) 

c vol 

Br* 



CutLim 



VolLim (H~) 



- VolLim(H+) 



-C3 



™ln(H-) _ vdLim{H - ) 



volume-term 



volume-term 
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Second step: Bias/variance decomposition of cut and volume terms 

In order to show the convergence of the cut-term we do a bias/variance decomposition 



cut n 

CutLir 

c cut 



< 



cut n _ / cut r . 



e(^^ ] -CulLim 



cut 



variance term bias term 



and show the convergence to zero of these terms separately. Clearly, the same decomposition can 
be done for the volume terms. In the following we call these terms the "bias term of the cut" and 
the "variance term of the cut" and similarly for the volume. 

For both, the cut and the volume, there is one result in this section dealing with the convergence 
properties of the bias term and the variance term on each particular graph type and weighting 
scheme. 

Third step: Use concentration of measure inequalities for the variance term 

Bounding the deviation of a random variable from its expectation is a well-studied problem in 
statistics and there are a couple of so-called concentration of measure inequalities that bound 
the probability of a large deviation from the mean. In this paper we use McDiarmid's inequality 
for the kNN graphs and a concentration of measure result for [/-statistics by Hoeffding for the 
r-neighborhood graph and the complete weighted graph. The reason for this is that each of the 
graph types has its particular advantages and disadvantages when it comes to the prerequisites for 
the concentration inequalities: The advantage of the kNN graph is that we can bound the degree 
of a node linearly in the parameter k, whereas for the r-neighborhood graph we can bound the 
degree only by the trivial bound (n — 1) and for the complete graph this bound is even attained. 
Therefore, using the same proof as for the kNN-graph is suboptimal for the latter two graphs. On 
the other hand, in these graphs the connectivity between points is not random given their position 
and it is always symmetric. This allows us to use a {/-statistics argument, which cannot be applied 
to the kNN-graph, since the connectivity there may be unsymmetric (at least for the directed one) 
and the connectivity between each two points depends on all the sample points. 

Note that these results are of a probabilistic nature, that is we obtain results of the form 



Pr 



cut„ „ / cut, 
— £ — E 



> £ < Pn, 



for a sequence (p n ) of non-negative real numbers. If for all e > the sum Y^iL\Pi IS finite, then 
we have almost sure convergence of the variance term to zero by the Borel-Cantelli lemma. 

Fourth step: Bias of the cut term 

While all steps so far were pretty much standard, this part is the technically most challenging 
part of our convergence proof. We have to prove the convergence of E(cut„ /s„ ut ) to CutLim 
(and similarly for the volume). Omitting all technical difficulties like boundary effects and the 
variability of the density, the basic ideas can be described in a rather simple manner. 

The first idea is to break the cut down into the contributions of each single edge. We define a 
random variable Wij that attains the weight of the edge between Xi and Xj, if these points are 
connected in the graph and on different sides of the hyperplane S, and zero otherwise. By the 
linearity of the expectation and the fact that the points are sampled i.i.d. 



E (cut„) = J2J2 EW V = H ( n ~ 



i=l j=l 



Now we fix the positions of the points x\ = x and x-i = y. In this case Wij can attain only two 
values: /„(dist(x, yj) if the points are connected and on different sides of S, and zero otherwise. 
We first consider the r-neighborhood graph with parameter r n , since here the existence of an edge 
between two points is determined by their distance, and is not random as in the kNN graph. Two 
points are connected if their distance is not greater than r n and thus W i} - = if dist(x, y) > r n . 
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Furthermore, Wij = if x and y are on the same side of S. That is, for a point x E H + we have 
E(W 12 \xi=x,x 2 = y) 



/ n (dist(x, y)) if y is in the cap B(x, r n ) n H 
otherwise. 



By integrating over K we obtain 

E(VKi2 \xi=x)= / f n (dist(x,y))p(y) dy 

J B(x,r n )nH- 

and denote the integral on the right hand side in the following by g(x). 

Integrating the conditional expectation over all possible positions of the point x in M. d gives 

E(W r i2) = / g(x) p{x) dx — g(x) p(x) dx + g(x) p(x) dx. 

JR d JH+ J H- 

We only consider the integral over the halfspace H + here, since the other integral can be treated 
analogously. The important idea in the evaluation of this integral is the following: Instead of 
integrating over H + , we initially integrate over the hyperplane S and then, at each point s G S, 
along the normal line through s, that is the line s + tns for all t € M>o- This leads to 

g(x) p(x) dx = / g(s + tns) p{ s + tns) dt ds. 
h+ Js Jo 




Figure 4: Integration along the normal line through s. Obviously, for t > r n the intersection 
B(s + tns, r n) H H~ is empty and therefore g(s + tns) = 0. For < t < r n the points in the cap 
are close to s and therefore the density in the cap is approximately p{s). 

This integration is illustrated in Figure [4] It has two advantages: First, if x is far enough from S 
(that is, dist(x, s) > r n for all s e S), then g(x) = and the corresponding terms in the integral 
vanish. Second, if x is close to s £ S and the radius r n is small, then the density on the ball B(x,r n ) 
can be considered approximately uniform, that is we assume p(y) = p(s) for all y G B(x, r n ). Thus, 

g(s + tns) p(s + tns) dt = g(s + tn s ) p{s + tns) dt 



= P(s) g{s + tn s ) dt=p 2 (s) / / n (dist(x, y)) dy dt 

JO JO J B{x,r n )nH- 

= Vd-i / u d f n (u) du p 2 (s) 



Jo 

where the last step follows with Lemma [3j 

Since this integral of the weight function f n over the "caps" plays such an important role in the 
derivation of our results we introduce a special notation for it: For a radius r € M>o and q — 1, 2 
we define 

4 9) W=%-1 f u d ti{u) du. 
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Although these integrals also depend on n we do not make this dependence explicit. In fact, the 
parameter r is replaced by the radius r„ in the case of the r-neighborhood graph or by a different 
graph parameter depending on n for the other neighborhood graphs. Therefore the dependence 
of F^?\r n ) on n will be understood. Note that we allow the notation Fq\oo), if the indefinite 
integral exists. The integral Fff' for q — 2 is needed for the following reason: For the {/-statistics 
bound on the variance term we do not only have to compute the expectation of Wij , but also their 
variance. But the variance can in turn be bounded by the expectation of W?j, which is expressed 

in terms of Fq (r n ). 

In the r-neighborhood graph points are only connected within a certain radius r n , which means 
that to compute E(Wi2 \ Xi = x) we only have to integrate over the ball B(x,r n ), since all other 
points cannot be connected to xi — x. This is clearly different for the complete graph, where every 
point is connected to every other point. The idea is to fix a radius r„ in such a way as to make sure 
that the contribution of edges to points outside B(x, r n ) can be neglected, because their weight is 
small. Since W12 = /„(dist(a;i, X2)) if the points are on different sides of S we have for x £ H + 

E(Wi2 I X! =x) = / / n (dist(x,y)) p(y) dy + /„(dist(x, y)) p(y) dy 

J B(x,r n )nH~ J B(x,r„)"nH- 



< g(x)+p ma , x / f n (dist(x,y)) dy. 

JB(x,r n ) c 

For the Gaussian weight function the integral converges to zero very quickly, if r n /a n — > 00 for 
n — > 00. Thus we can treat the complete graph almost as the r-neighborhood graph. 

For the fc-nearest neighbor graph the connectedness of points depends on their fc-nearest neighbor 
radii that is, the distance of the point to its k-th nearest neighbor, which is itself a random variable. 
However, one can show that with very high probability the fc-nearest neighbor radius of a point in 
a region with uniform density p is concentrated around (fc„/((n— \)rjdp) 1 ^ d . Since we assume that 
kn/n — > for n — > 00 the expected kNN radius converges to zero. Thus the density in balls with 
this radius is close to uniform and the estimate becomes more accurate. Upper and lower bounds 
on the fc-nearest neighbor radius that hold with high probability are given in Lemma [2] The idea 
is to perform the integration above for both, the lower bound on the kNN radius and the upper 
bound on the kNN radius. Then it is shown that these integrals converge to the same limit. 

Fifth step: Bias of the volume terms 

The bias of the volume term can be treated similarly to the cut term. We define Wij — f n (dist(xi,Xj) 
if Xi and Xj are connected in the graph and Wy = otherwise. Note that we do not need the 
condition that the points have to be on different sides of the hyperplane S as for the cut. Then, 
for a point x £ C if we assume that the density is uniform within distance r„ around x 

E(VFi2 \ xi=x)= I f n (dist(x,y))p(y) dy =p(x) / f n (6M(x,y)) dy 

JB(x,r n ) JB(x,r n ) 



= dt] d I u d 1 f n (u) du p(x), 
Jo 

where the last integral transform follows with Lemma [HJ Integrating over M. d we obtain 
E(W 12 ) = f E(W 12 I xx = x)p(x) dx = dr]d ! " u^fniu) du f p 2 (x) dx. 



Since the integral over the balls is so important in the formulation of our general results we often 
call it the "ball integral" and introduce the notation 

F<g\r) = d m r u d ~ l f n {u) du 



for a radius r > and q — 1,2. The remarks that were made on the "cap integral" Fair) above 
also apply to the "ball integral" Fb{t). 
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Lemma 3 



Lemma 2 



Lemma 11 



Lemma 5 




Lemma 8 



Lemma 9 



Lemma 10 



Theorem 1 



Figure 5: The structure of the proofs in this section. Proposition [T] and [4] state bounds for 
general weight functions on the bias and the variance term of the cut and the volume, respectively. 
Lemma [2] shows the concentration of the kNN radii, Lemma 11 is needed to bound the influence 
of points close to the boundary. Lemma|3] and [5] perform the integration of the weight function 
over "caps" and "balls" . In Lemmas [8]|TTj]tIie general "ball" and "cap" integrals are evaluated for 
the specific weight functions we use. Using these results, Corollaries [T][3] dealing with the cut and 
Corollaries |4][6] dealing with the volume are proved. Finally, in Theorem [T] the convergence of NCut 
and CheegerCut are analyzed using the result of these corollaries. 



Sixth step: Plugging in the weight functions 

Having derived results on the bias term of the cut and volume for general weight functions, we can 
now plug in the specific weight functions in which we are interested in this paper. This boils down 
to the evaluation of the "cap" and "ball" integrals Fc(r n ) and Fs(r n ) for these weight functions. 
For the unit weight function the integrals can be computed exactly, whereas for the Gaussian 
weight function we study the asymptotic behavior of the "cap" and "ball" integral in the cases 
Tnjo~n — > and r„/cr„ — > 00 for n — >• 00. 



6.2 Proofs for the A>nearest neighbor graph 

As we have already mentioned we will give the proofs of our general propositions in detail here 
and then discuss in Section |6.3| how they have to be adapted to the r-neighborhood graph and the 
complete weighted graph. This means, that Lemmas [3] and [5] that are necessary for the proof of 
the general propositions can be found in this section, although they are also needed for the r-graph 
and the fc-nearest neighbor graph. 

This section consists of four subsections: In Section [6.2. 1| we define some quantities that help us 
to deal with the fact that the connectivity between two points is random even if we know their 



distance. These quantities will play an important role in the succeeding sections. Section 6.2.2 



presents the results for the cut term, whereas Section[6.2.3 presents the results for the volume term 



Finally, these results are used to proof Theorem [T] the main theorem for the fc-nearest neighbor 
graph in Section [6. 2.4| 

In the subsections on the cut-term and the volume term we always present the proposition for 
general weight functions first. Then the lemmas follow that are used in the proof of the proposition. 
Finally, we show corollaries that apply these general results to the specific weight functions we 
consider in this paper. An overview of the proof structure is given in Figure [5] 



6.2.1 fc-nearest neighbor radii 

As we have explained in Section |6.1| the basic ideas of our convergence proofs are similar for all 
the graphs. However, there is one major technical difficulty for the fc-nearest neighbor graph: The 
existence of an edge between two points depends on all the other sample points and it is random, 
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even if we know the distance between the points. However, each sample point is connected to 
its k nearest neighbors, that means to all points with a distance not greater than that of the /c-th 
nearest neighbor. This distance is called the fc-nearest neighbor radius of point Xi. Unfortunately, 
given a sample point we do not know this radius without looking at all the other points. The idea 
to overcome this difficulty is the following: Given the position of a sample point we give lower and 
upper bounds on the kNN radius that depend on the density around the point and show that with 
high probability the true radius is between these bounds. Then we can replace the integration 
over balls of a fixed radius with the integration over balls with the lower and upper bound on the 
kNN radius in the proof for the bias term and then show that these integrals converge towards 
each other. Furthermore, under our assumptions the radius of all the points can be bounded from 
above, which helps to bound the influence of far-away points. 

In this section we define formally the bounds on the /c-nearest neighbor radii, since these will 
be used in the statement of the general proposition. In Lemma [2] we state the bounds on the 
probabilities that the true kNN radius is between our bounds for the cases we need in the proofs. 

We first introduce the upper bound r™ ax on the maximum fc-nearest neighbor radius of a point 
not depending on its position. Second, we use that given a point x (far enough) in the interior of 
C the conditional kNN radius of a sample point at x is highly concentrated around a radius r n (x). 
Formally, we define 

/ K 

and r n {x) = n- " — for all ieC. 

y (n- Y)p[x)r\d 

As to the concentration we state sequences of lower and upper bounds, r~{x) and r^(x) that 
converge to r n (x) such that for all x € C that are not in a small boundary strip the probability 
that a point in x is connected to a point in y becomes small if the distance between x and y exceeds 
r+(x) and becomes large if the distance is smaller than r~{x). 

Clearly, the accuracy of the bounds depends on how much the density can vary around x. Setting 
£„ = 2p( nax r™ ax /p m i n the density in the ball of radius 2r™ ax around x can vary between (1 — £„)p(x) 
and (1 + £,n)p(x). Furthermore, we have to "blow up" or shrink the radii a bit in order to be sure 
that the true kNN radius is between them. To this end we introduce a sequence (5 n ) nl =iq with 
5 n — > and S n k n — > oo for n — > oo. Then we can define 

r~{x) = t/(l-2C n ){l-8 n )r n {x) and r+(x) = ^(1 + 2£„)(1 + S n )r n (x). 

Note that converges to zero, since r™ ax converges to zero as ^/k n /n. The sequence S n is chosen 
such that it converges to zero reasonably fast, but that with high probability r„(x) and r~{x) are 
bounds on the kNN radius of a point at x. 

In order to quantify the probability of connections, which we seek to bound, we define the function 

c : R d x R d -)• [0, 1] by 




c(x,y) 



Pr (C12 \ xi = x, X2 — y) if x e C and y e C 
otherwise, 



where C\2 denotes the event that there is an edge between the sample points x\ and X2 in the 
(directed or undirected) fc-nearest neighbor graph. 



6.2.2 The cut term in the kNN graph 

Proposition 1 Let G n be the directed, symmetric or mutual k-nearest neighbor graph with a 
monotonically decreasing weight function f n . Set S n — y (8i5o log n)/k n for some So > 2 in the 



1G 



definition of r n (x) . Then we have for the bias term 



E (^)- 2 L As)F ° )(rnis)) ds 



ohp (rn ^ 



+ O ( min <j n-°°f n ( inf r„(x) ) , F%\oo) ~ F<£ ] ( inf r„(x) 

1+1/d 



Furthermore, we have for the variance term for a suitable constant C 

Ce 2 \ 



Pr ( cut,, — E (cut** 5 ) > ej < 2exp -- 



nklfm / 

Proof. We define for i, j € {1, . . . , n}, i ^ j the random variable Wij as 



/ n (dist(xi, Xj) if Xj € H + ,Xj G if and (xi,Xj) edge in G„ 
otherwise. 



For both, a directed and an undirected graph we have 



cut,, = ^2^2 W, 



i j ■ 

i=l 3=1 

and by the linearity of expectation and the fact that the points are independent and identically 
distributed, we have 



E 



n. n n -. 

rU " ' -l^EE E^) = -n(n 1)E(W 12 ) = E(W 12 ). 

— l).*-^ n(n — 1) 



n(n — 1) / n(n — 1) ^ J n(n — 1) 



In the convergence proof for the variance term of the cut for the r-neighborhood graph in Propo- 
sition [6] we need a bound on E(H /r 1 2 2 ). Since this can be derived similarly to E(Wi 2 ) we state the 
following for E(W? 2 ) for q = 1,2. 

We define C\ 2 to be the event that the sample points Xj and x 2 are connected in the graph. 
Conditioning on the location of the points X\ G C and x 2 G C we obtain W 12 = if Xj and x 2 on 
the same side of the hyperplane S, otherwise 



W 12 



/ n (dist(xi,x 2 )) if C12 = 1 
otherwise. 



Therefore, if xi G C and X2 G C are on different sides of S 

E (Wf 2 | xi = x, x 2 = y) = /^(dist(x, y)) Pr (C 12 | xi = x, x 2 = y) 
With c(x,y) as above we have 

E(W?a) = / / E(W? 2 | Xl = x,x 2 - y)p(y) dy p(x) dx 



c Jc 

/ /n(dist(x, y)) Pr (C 12 \ x x = x, x 2 = y) p(y) dy p(x) dx 
H+nc JH-nc 

/«(dist(x,y))Pr(Ci 2 | x x =x,x 2 = y)p(y) dy p(x) dx 

H-nc Jh+dc 

/«(dist(x, y))c(x,y)p(y) dy p(x) dx 

H+ JH- 



/«(dist(x, y))c(x,y)p(y) dy p(x) dx. 
h- Jh+ 
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Setting 



7(s) 



= IIh- fn( d[st i x ,y))c(x,y)p{y) dy if x G i?+ 
/n(dist(x, y))c(x,y)p(y) dy if at G ff" 



we obtain 



E(Wg,) 



9 "l - 



<7(x)p(x) dx 



H + 



(x)p(x) dx + / g(x)p(x) dx. 



h- 



We only deal with the first integral here, the second can be computed analogously. By a simple 
transformation of the coordinate system we can write this integral as an integral along the hyper- 
plane S, and for each points s in S we integrate over the normal line through s. In the following 
we find lower and upper bounds on the integral 



g(s + tng)p (s + trig) dtds — / h n (s)ds 



s Jo 



where we have set 



K(s) = / g(s + tn s )p(s + tn s ) dt. 
Jo 

We set I n = {x E C \ dist(x, dC) > 2r™ ax } and use the following decomposition of the integral 



h n (s) ds- I p 2 (s)F^ (r n (s)) ds 



< 



h n (s) ds 



h n (s) ds 



Snx n 



+ 



h n (s) ds 



SCOn 



sra„ 



p\s)F { ^ (r n (s)) ds 



SfXL n 



p\s)F { c q) (r n {s)) ds - I p\s)F£> (r n (s)) ds 



(?) 



snc 



(5) 
(6) 
(7) 



We first give a bound on the right hand side of Equation Setting 1Z n = {x e R d | dist(x, dC) < 
2r" lax } and A n = M. d \(I n UlZ n ), we have (considering that the integrand is positive and SC\I n C S) 



h n (s) ds 



h n (s) ds 



snz„ 



h n {s) ds + 



smz n 



snA n 



h n (s) ds, 



that is, we have to derive upper bounds on the two integrals on the right hand side. 

First let s £ S n A n , that is s £ C and dist(s, C) > 2r™ ax . Consequently p(s + tn s ) = for 
t < 2r™ ax . On the other hand, if t > 2r™ ax we have dist(s + tn s ,y) > 2r™ ax for all y G H~ . 
Setting c„ = 2exp(— fc„/8) we have with Lemma[2]c(s + tns, y) < c n for all y G H~ . Hence 



){s + tn s ) < 



B(s+tn s ,r™^)C\H- 



/n(dist(s + tn S) y))c(s + tns, y)p(y) dy 



/n(dist(s + tn s , y))c(a + tns, y)p{y) dy 



I B(s+tn s ,r™ x ) c nH- 

< ft (C ax ) / c(s + tn s , y) P (y) dy < Cn p n (C ax ) , 



since B(s + tn s , r™ ax ) n H~ 

all seSn An 



for t > r" lax and /„ is monotonically decreasing. Therefore, for 



h n(s) = / g(s + tn s )p(s + tn s ) dt < 



g(s + tns)p(s + tns) dt 



<Cn/„'(rD P (s+tn S )dt, 
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and thus 

/ K(s) ds< f cJl (C ax ) fp(s + tn s ) dt ds 
JsnA„ J snA n Jo 

< Cntt (C aX ) / I™ P{s + tns) dt ds < c n f n (C ax ) . 
Js Jo 

Now let s G SnTl n . Then 

g(s + tn s ) = / /«(dist(s + tn s , y))c(s + tns, y)p{v) dy 
J h- 

< I f q (dist(s + tns,y))c(s + tn s ,y)p(y)dy 

J B(s+tns,r^ x )nH- 

/«(dist(s + tn s , y))c(s + tn s , y)p(y) dy 

< JW / /„ 9 (dist( S + tn s , y)) dy + c„/« (r^) ■ 

Considering that B(s + tns, r™ ax ) n H~ = for t > r™ ax and therefore the first integral vanishes 
in this case, we have for all s E S n TZ n 



h„(s) = / g(s + tn s )p(s + tn s ) dt 



/ f n{dist(s + tn s , y)) dy p(s + tn s ) dt 

J B(s+tn s ,r™ ax )nH- 
/>oo 

+ c„/«(C ax ) / P(s + tn s ) dt 



/ / f%(dist(s + tn s ,y)) dy dt 

Jo J B(s+tn s ,r™ ax )nH- 

/>oo 

+ Cnf q n (C ax ) / P(s + tn s )df 

JO 

< pLx4 ?) (C ax ) + c n P n (C ax ) / p {a + tn s ) dt, 



and thus 



/ K (a) ds < f Pl 2 nax 4 9) (r« lax ) + c n p n (C ax ) fp( S + ins) dt ds 
Jsnn n JsnK n Jo 

< pLA q) (C ax ) Cd-i (s n n n ) + Cn p n (c ax ) . 



For some weight functions, for example the Gaussian, it is preferable to use that for all x £ M. d 
and all radii r 



/ /«(dist(x, y))c(x, y)p{y) dy < / mdist(x,y)) dy 

J B(x, r ynH- JB(x,r)" 

= 1W ( f /'(distfoi,)) dy- [ /«(diflt(x s y)) dy) = p max - F ( B q) (r)) . 

» -'R d JB{x,r) ' 



We have according to Lemmaf]!]/^-! (S* n 7£ n ) = 0(rjf ax ). Consequently, using r™ ax = 0{ y/k n /ri) 
and plugging in c n 

/ h n (s) da- h n (s) ds 

Js Jsm n 

= 0\FM (C ax ) ^+min|exp(-fc n /8)/« (inf ,„(,)), (^(oo) - ^( C ax )) }) • 
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Now we consider the term in Equation In the following, note that with £ n = 2p( nax r™ ax /p 11 
we have for all x G C with B(x, 2< ax ) C C and y G B(x, 2r™ ax ) 

(l-£ n )p(x) < P {y) < (l + £ n )p(x). 

We assume that n is sufficiently large such that £ n < 1/2. 
For any s g S fl 1„ and any t > we have 

g(s + tn s ) = / /*(dist(s + tn s , y))c(s + tn s , y)p(y) dy 
J h- 



> / f%(dist(s + tn s ,y))c(s + tn s ,y)p(y) dy. 

J B(s+tn s ,r n (s))nH~ 

If t > r~(s) we use the trivial bound g(s + tns) > 0- Otherwise we have with Lemma [2] for 
all y G B(s + tns,r~(s)) n H~ that c(s + tns,y) > 1 — a n with a n — 6exp (— 5^fc„/3). Using, 
furthermore, the bound p(y) > (1 — £, n )p{s) we obtain 

g{s + tn s ) > / /^(dist(s + tn s ,y)){l - a„)(l - £ n )p(s) dy 

J B(s+tn S: r n (s))nH~ 

= (1 - o»)(l - £ n )p(s) / /^(dist(s + tn s , y)) dy. 

J B(s+tn s ,r n (s))nH- 

That is, we obtain for s G X n 

poo r r ^i s ) 
h n (s) = / g (s + tns) p (s + tns) dt > g (s + tn s )p(s + tn s ) dt 

Jo Jo 

> / g{s + tn s ) dt 
Jo 

> (1 - a n )(l - Cn)V(s) / " ( } f /^(dist(s + in s , y)) dy di 

Jo J B(s+tn s ,r n (s))nH- 

>(l-a„)(l-e n )V( S )4 9) (rnW). 
where in the last inequality we have applied Lemma [3] 
Therefore 

h n (s) ds > (1 - a n )(l - e„) 2 / P 2 (s)4 ?) (»•*(«)) ds 

Sni„ Jsm n 
> (1 - a„)(l - C„) 2 / p 2 ( S )4 9) (r„(s)) ds 



sra« 

ds 



p 2 (s) (i#> (r n (s))-F^ (r-(s)) 



> 



p 2 (s)F^ (r n (s)) ds - (a n + £„) / P 2 {s)F ( c q) (r n (s)) ds 
snx„ Jsm n 



and thus 



pL* / (4 9) (r n ( S ))-4 9) (r"( S ))) ds, 



K(s) ds- P \s)F ( c q) (r n (s)) ds 

Sni„ Jsnx n 

2/- M?(?) 



> -(an + 6.) / p 2 (s)i^ (r„(s)) ds 



sni„ 



pL^-i(SnC) sup (4 9) (r+(s))-4 9) (r n (s))). (8) 
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Now, we want to find an upper bound on g(s + tns) for s € S H I n , that is B(s, 2r™ ax ) C C. We 
use the following decomposition 

g(s + tns) = / /n(dist(s + tns, y))c(s + tn s , y)p{y) dy 
Jh- 



< 



< / f%(dist(s + tn s ,y))c(s + tns,y)p(y) dy 

J B(s+tn s ,r£(s))nH- 

/«(dist(s + tn s , y))c(s + tn s , y)p(y) dy. 

B{s+tn s ,r+(s)y-nH- 

We use in the first term the trivial bound c(s + tns, y) < 1 and in the second term the monotonicity 
of f n and the bound b n = 6exp(— 5zk n /A) on the probability of connectedness when the distance 
is greater than r+ (s) from Lemma [2] to obtain 

g(s + tn s )< / fZ(dxst(s + tn s ,y))p(y) dy 

J B(s+tn s ,r+(s))nH- 

+ Kft(r+(a)) [ p(y)dy 

J B(s+tn s ,r+ (s))°nH- 

/«(dist( 8 + tn s , V))p{y) dy + b n f« (r+(s)) . 

B(s+tn s ,r+(s))nH- 

Using a bound on the density in the balls B(s + tns, r t, ( s )) we obtain 

g{s + tn s ) < (1 + UMs) [ /£(dist(s + *n s , y)) dy + b n f n (r+(s)) , 

JB(s+tns,r^"(s))nH- 

and observe that g(s+ins) < b n f%(r£(s)) iit> r+(s) since in this case B(s+tns, r+(s))CiH~ = 0. 
That is, 

/>OG 

K{s) = / g(s + in s )p(s + tn s ) df 
Jo 

< /" ^ \l + t n )p(a) [ f «{dist{s + tn s , y))dyp(s + tn s ) dt 

Jo J B(s+tns,r£(s))r\H- 

/>oo 

+ / b n fZ{rt(s))p(s + tn s ) dt 
Jo 

< (l + e„)V(«) / " ( } / mdist(s + tn s , y)) dy dt 

Jo J B^+tnSirtis^nH- 

+ bnf? l (r+( S )) / p( S + tn s ) dt 
Jo 

poo 

= (l + gy( S )F<?»(r n + ( S )) + U2(r+( S )) / p(s + tn s ) df 

Jo 

Therefore, considering that < 1/2 

K(s) ds < (1 + £ n ) 2 / P 2 (sH q) (r+(s)) ds 

+ bn [ II (r+(a)) / p (s + in s ) dt ds 
Jsm n Jo 

<(l + 3£„)f P 2 (s)4 9) (r n (s)) ds 
Jsm n 



snx, 



P 2 {s) (r B (»)) 



ds 
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Consequently, 



h n (s) ds- / P 2 {s)F { ( ? ) (r«(a)) ds 



< 3 Pl 2 nax sup (r+(a)) - F$> (r n (s))) C d ^(S n C) 

+ 3£„ / p 2 (s)4 9) (r„(a)) ds + b n p n ( inf r+(a)) 

Similarly to the remark above we can replace b n f% (inf sS 5 n c r+(s)) by 

fc-xta*)-^^ inf r n (s)) 

which gives a better bound for some weight functions, especially the Gaussian. 

Combining Equation Q and Equation ([9]), using the monotonicity of and / we obtain 

h n (s) ds- f P 2 {s)F ( c q) (r„(s)) ds 



(9) 



= <?( sup (4 9) (r+( S ))-4 9) (r"( S ))) 



+ O f(a„ + ^„)4 9) (C ax ) + min |& n /« [fof r n (a)J , F^(oo) - fj* 5 ( mf r n {x)) 

We still have to bound the first term. For some weight functions, especially the Gaussian, we have 
sup (r+(*)) - F^ (r-(s))) < F#> (oo) - F^ ( inf r -(x)) . 

For the other weight functions we use 

F^ (r+( S )) (r"( S )) = r iS) u d PM du- u*ft(u) du 

Jo Jo 

< 



II (<■-(,)) J'"' u* du = (r"(.)) ((r,|(«))" +1 - (r-(«))" +I ) 



d 

Since, with £ n < 1/2 and <5„ < 1, 

'^(s)^ 1 /(l + 2en)(l + 2^)fc n (n-lMs)r? d V +1/d 



r n (s) J V {n-l)p(s)rj d k n 

= ((1 + 2f n )(l + 2£„)) 1+1/d < 1 + 54£„ + 85„ 



and a similar bound holds for the other quotient we have 

4 ?) (r+(a)) - F^ (r-(s)) = O + 5 n )f« (fof r~(x)j (C ax ) d+1 ) • 

With our choice of S n we have, considering that So > 2, 

a n = 6exp f— <5„fe n /3J = 6exp (— (8<5q logn)/3) < 6exp (—5 logn) = 6/n 5 , 
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that is, for n sufficiently large such that 6/n 5 < £„, considering that £„ ~ 0(^/k n /n and plugging 
in b n we have 



h n (s)ds- p 2 (.s)F^ (r n (s)) ds 

Snx„ Jsnx n 

= O (min | C ^ + ^ /« ( inf r" (*)) (C ax ) d+1 , (oo) - J#> ( inf r" 

(C aX ) +min|exp (-S 2 n ^j /« (infr n ( S )) , F^(oo) - F«( Jnf r B (»))} 



Finally, we bound the term in Equation Q . Setting TZ' n — C \ X n we have 



^(s)F^ (r„(s)) ds - / ^(s)F^ (r„(s)) ds 



SnX„ 



(9) 



snc 



p z {s)F^ (r n (s)) ds 



smz' 



< Pmax4 9) fmaxr„(x)) £ d _! (5 n K) < p 2 max F^ q) (m&xr n (x)) C d -i (S n ft„) 



Using Lemma 11 we have £d-i (S n lZ n ) — (3(r™ ax ), and thus 



p\s)F^ (r n («)) ds - / ^(s)F^ (r„(s)) ds 
Sni„ j snc 



n(9) 



= O ^maxr„(aO 



d I k n 
V n 



Deriving the same bounds for the other halfspace and collecting the three bounds we obtain the 
result, considering that k n /8 > S 2 k n /8, S 2 k n /4: > S 2 k n /8 and r™ ax > max xe c r n {x) due to the 
monotonicity of F^ . 

Finally, we discuss the choice of 5 n . With this choice of 5 n we have exp (— 5 2 n k n /8) = n- g °. Note 
that this is the fastest convergence rate of 6 n for which the exponential term converges polynomially 
in 1/n, which we will need in the proof of the following corollaries. In all the other terms above 
5 n has to be chosen as small as possible, so this is the best convergence rate for S n . Note further 
that for this choice of 5 n we require fe n /logn —¥ 00, since S n has to converge to zero. 

Now we proof the bound for the variance term. According to Corollary 3.2.3 from |Miller et aL] 
(19971 the maximum degree of the symmetric fc„-nearest neighbor graph is bounded by (r<2 + 



l)k n , where Td denotes the kissing number in dimension d, that is, the maximum number of unit 
hypershpheres that touch another unit hypersphere without any intersections. 

Thus, removing a point from the graph and inserting it in a different place the number of (undi- 
rected) edges in the cut can change by at most 2{jd + 1). Since we count undirected edges twice 
we obtain for all types of fc-nearest neighbor graphs 



cut„ - cutW < 4(r d + l)fc„/„(0), 



(i) 

where cut„ denotes the value of the cut in a graph where exactly one point has been moved to a 
different place. Thus by McDiarmid's inequality for a suitable constant C > 



Pr 1 



( cut„ -E fcutW) > e) 



< 2 exp 



2s 2 



n(4(T d + l)k n f n (0)Y 



2 exp 



Ce 2 



□ 



The following lemma states bounds on c(x, y), that is the probability of edges between points at x 
and y, in the cases that we need in the convergence proofs for the cut and the volume. 



23 



Lemma 2 (kNN radii) Let G n be the directed, mutual or symmetric k n -nearest neighbor graph. 
Let k n /n be sufficiently small such that r™ ax < r 1 . Then, if x,y £ M. d and dist(x, y) > r™ ax we 
have c(x,y) < 2exp(— fc ra /8). 

Set £ n = 2p{ nax rf l lax /p min and define l n = {s £ C \ B(s, 2r™ ax ) C C}. Let n be sufficiently large 
such that £„ < 1/2 and let 8 n £ (0, 1) with 5 n — > for n — ¥ oo and k n S n > 1 for sufficiently large 
n. 

Let x = s + tng with s £ X n P\S. 1ft £ K>o andy £ H~ ort £ K< and y £ H + , and, furthermore, 
dist(x,y) > r+(s) then c(x,y) < 6exp (— <5^fc„/4) . The same holds for x £ T n and y £ C with 
dist{x,y) > r+(x). 

Let x — s + tns with t £ [0, r~ (s)] and y £ H~ ort£ [— r~(s), 0] and y £ H + . 7/dist(x, y) < r~(s) 
then c(x, y) > 1 — 6 exp (— S^k n /3) . The same holds for x £ I n and y £ C with dist(x, y) < r~(x). 

Proof. Wc first show bounds on the probability of connectedness for the directed fc-nearest 
neighbor graph. These are used in the second part of this proof in order to show bounds for the 
undirected graph as well. Let Dij denote the event that there exists an edge between xi and Xj in 
the directed /c-nearest neighbor graph. 

First we show the statement concerning the maximal fc-nearest neighbor radius. For any x £ C we 
have 



/i(i?(x,C ax ) =M ( B [x, {— M ] > Pmin C d \B Ix, {— Mnc 



n J 4 fc„ \ \ 4 k n Ak r , 

> PminlCd I B I X, i = p min 7 -rj d 



-i I I jr linn i h in -. 

JPminVd n-l I I 7PminVd U-l n-l 

Now suppose we fix x\ and x-i with dist(cci, x-i) > r™ ax . If U denotes the random variable that 
counts the number of points X3, . . . ,x n in B(xi,r™ ax ) we have U ~ Bin(n — 2, r™ ax ))). 
Setting V ~ Bin(n — 2, 4fc„/(n — 1)), we certainly have < k n /(n — 2) < 4fc„/(n — 1) for n > 3 
and thus we obtain with a tail bound for the binomial distribution from |Srivastav and Stangier 
(19961, which was first proved in|Angluin and Valiant (1979), 



Pr (D12) = Pr (U < fc„) < Pr (V < k n ) < exp 



'_l((™-2)^T-feny 
v 2 («-2)^ 



< 



exp 



k n 



In the following we show the statements concerning the upper bound r+(s) on the fc-nearest 
neighbor radii of points in regions of relatively homogeneous density. The proof for the lower 
bound r~ (s) is similar and is therefore omitted. Note, however, that the technical condition 
S n k n > 1 is needed for this case. 

First we show how we can bound the density in the balls B(s, 2r" lax ): For any z £ B(s, 2r™ ax ) we 
have by Taylor's theorem 

p(s) - 2pI nax C ax < P(y) < P(s) + 2p' max rr x , 
and thus, with £„ = 2p^ lax r,™ ax /p min , 

(1 - fn)p(a) < p(v) < (1 + &.M*)- 

These bounds are used below to bound the probability mass of balls within B(s, 2r™ ax ). 

Now, we bound the probability mass in B(x, dist(a;, y)) and B(y,dist(x,y)) from below, when 
dist(ir, y) > r^(s). We first observe that 



r+ = d (l + 2£ n )(l + 6 n )k n < J 4k n 



(n - l)p(s)n d V (n - l)jp m inVd 
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Suppose t = dist(x, s) < r+(s). Then 

H (B(x, dist(x, y))) > fi (B(x, r+(s))) 

with B(x,r+(s)) C 5(s,2r™ ax ). If t = dist(a:,s) > r+(s) we know that dist(x, y) > dist(x,s), 
since x and y are on different sides of the hyperplane S. We set x' = s + r+ (s)ns, that is 
the point on the line connecting s and x with distance r^(s) from s. Then, by construction, 
B(x',r+(s)) C B(s,2r™ ax ) and r+(s)) C dist(x, s)). Thus 

li {B{x, dist(x, y))) > n {B(x, dist(x, s))) > /i r+(s))) . 

Now we consider balls around the other point y. First, suppose dist(y,s) = r+(s). Then 

li(B(y,diBt(x,v)))>ii(B(y,r+(a))) 
with B(y,r+(s)) C B(s,2C ax ). 

If dist(y, s) > r+(s) we set y' = s + (y — s)/\\y — s\\, that is the point on the line connecting s and y 
with distance r+(s) from s. Then, by construction, £?(y',r+(s)) C i?(s, 2r" lax ) and B(y',r+(s)) C 
£>(y, dist(y, s)). Since cc and y are on different sides of 5 we have dist(y, s)) < dist(y, x). Therefore 

/i (B(y, dist(y, x))) > \i (B(y, dist(y, «))) > /x (£(y', r+(a))) . 

We show how to bound [i{B (x , r+ (s))) . The same bound can be shown for the probability mass 
in B(x', r+(s)), B(y,r+(s)) and B(y',r+(s)), since all of these balls lie in £?(s,2r™ ax ). We have, 
since < 1/2, 

M (B(*,r+W)) > (1 - ^)p(s)m K(.s))) rf = (1 - ^)p( s)% il+|M|+^ 

= (1 - e„)(l + 2^)(1 + 5 n )~ > (1 + M^T- 

n — 1 n — 1 

Let U+ ~ Bin (n - 2,/j, (B(x, r+(s)))) and 7+ - Bin (n - 2, (1 + 5 n )k n /(n - 1)). Then, we have 
for (n - 2)S n > 1 



< 



ri - 2 



= 1 



n - 2 / n - 1 



< (1 + <fn) 



n- 1 



and thus, by the tail bound from Angluin and Valiant (1979), 



Pr(£> 12 ) = Pr (U+ < k) < Pr (V+ < k) < exp 



I ! ((n-2)(l + 5 n )^ T -fc„)' 



V 



( n _2)(l + ^)^ T 



We have 



(n-2)(l + S„) 



<^nfej 



n — 1 
ri — 1 



n — 1 



(1 + £ n )fe, 



n - 1 



and 



(n-2)(l + * n ) 



k n 
n- 1 



= 1- 



n — 1 



(l + <5„>„ < 2fc„, 



and thus, using 8 n < 1, 
Pr(Di 2 ) < exp 



fej fen 



4fc„ 



< exp 



\2 iL 



< 3 exp 
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This analysis can be carried over to the case t > r+(s)) and the same bound holds. 

The same bound holds also for Pr(D 2 i), since the same bounds for the probability mass in the 
balls B(y,r+(s)) and B(y',r+(s)) hold. 

In the final step of the proof we use the results derived so far to show the results for the undirected 
/c-nearest neighbor graphs. For the mutual kNN graph we have by definition Pr (C12) = Pr (C21) = 
Pr (D 12 n D 2 i). Thus, clearly, Pr (C 12 ) < Pr (D 12 ) and 

Pr (C12) = Pr {D 12 n D 2 i) = 1 - Pr (Df 2 U D c 21 ) > 1 - Pr (£>f 2 ) - Pr (£&) 
= 1 - (1 - Pr (D 12 )) - (1 - Pr (D 21 )) = Pr (D 12 ) + Pr (£> 21 ) - 1. 

This implies 

Pr (D 12 \ Xl =x,x 2 = y) + Pr (D 21 \ x 1 = x, x 2 = y) - 1 

< Pr (C12 I xi = a;, x 2 = y) < Pr (£>i 2 | Xi = x, x 2 = y) . 

For the symmetric kNN graph we have Pr(Ci2) = Pr((7 2 i) = Pr (D 12 U -D21), which implies 
Pr (C12) > Pr (D12) and by a union bound Pr (C12) < Pr (£>i 2 ) + Pr (£) 2 i). Therefore 

Pr (D 12 I xi = x, x 2 = y) < Pr (C 12 \ X\ = x, x 2 = y) 

< Pr (D 12 \ Xl =x,x 2 = y)+ Pr (D 21 \x 1 =x,x 2 = y). 

Thus, using the worse out of the two possible bounds we obtain for both undirected kNN graph 
types 

Pr (D 12 \ Xl =x,x 2 = y) + Pr (D 21 \ x 1 = x, x 2 = y) - 1 < Pr (C 12 \ x x = x, x 2 = y) 

< Pr (D 12 \ Xl =x,x 2 = y)+ Pr (D 21 \x 1 =x,x 2 = y). 

Plugging in the results for Pr(D\ 2 ) and Pr(_D 2 i) in the cases studied above, we obtain the result. 
□ 



Lemma 3 (Integral over caps) Let the general assumptions hold and let f : M> — > M>o be a 
monotonically decreasing function and s e S. Then we have for any R e R >0 

/ / f(dist(s + tn s ,y)) dy dt = %-i / u d f{u)du 

JO J B(s+tn s ,R)nH- Ju=0 



and 



/Or rR 
/ /(dist(s + tn s , y)) dy dt = / u d f(u) du 

-RJB(s+tn s ,R)nH- J u=0 



I B(s+tn s ,R)nH 

Proof. By a translation and rotation of our coordinate system in R d such that s + tns is the 
origin and — ng the first coordinate axis we obtain for t > 

f /(dist(s + tn s , V)) dy= f /(dist(0, z)) dz 

J B(s+tn s ,R)nH- J B(0M)n{z!>t} 

= / /(dist(0,z)) dz d ... dz 2 dzi 

Jzi=t J {z% + ...+Zj<R 2 -zj} 

= 11 f + + dz d--- d^ 2 dan 

Jz 1 =tJ{zl+...+zl<Ri-z 2 1 } \ V / 



r a{ Z1 ) 

J zi—t 



dz lt 



2G 



where we have set 



A(r) = f f (jr 2 + 4 + ... + Z*) dz d ... dz 2 . 

J{z 2 + ...+z 2 <R 2 -r 2 } \ V / 



Thus, 



f(dist(s + tn s ,y)) dy dt 



t=0 J B(s+tn s ,R)C]H 
R 



R. r R 



A(r) dr dt 



f-R rr rtt rr r H 

= / A(r) dt dr = / A(r) / dt dr = rA(r) dr 

Jr=0 Jt=0 Jr=0 Jt=0 J r=0 

Similarly, by the same translation and a rotation such that ns is the first coordinate axis we obtain 
for t < 



B(s+tn s ,R)r)H+ 



/(dist(s + tn s ,y)) dy = [ /(dist(0, z)) dz 

J B(0M)n{z!>-t} 

= I A{zx) dzi, 



that is, 



/up pO pR 

I /(dist(s + tn s , y)) dy dt = / / A(r) dr dt 

R J B(s+tn s ,R)nH- Jt=-RJr=~t 
rR rO rR rO rR 

= / A(r) dtdr= / A(r) / dt dr = rA(r) dr. 

Jr=0 Jt=—r Jr=0 Jt=—r Jr=0 



Therefore, both the integrals we want to compute are equal to J r=a rA(r) dr which we will treat 
in the following. First we are going to compute the (d — l)-dimcnsional integral A(r). Setting 
fr(s) = f(Vr 2 + s 2 ) we can write A(r) as the following integral in 



A(r) = f f (Jr 2 +xl + ... + x 2 _ 1 ) dx d ^ . . . 

J{x 2 + ...+x 2 d _^<R 2 -r 2 ~} VV 7 



dxi 



\\x\\<VR 2 -r 2 



fr(\\x\\)dx = 



VR 2 -r 2 



(rf-l)%_ lS d - 2 / r (s) ds 



(d - l)v d -i J o s d - 2 f {^r 2 + s 2 J ds. 



Plugging in this expression for A(r) we obtain 



rA(r) dr = (d — l)rjd- 



r=0 



R r^R 2 -r 2 
r=0 Js=0 



rs d - 2 f (^r 2 + s 2 ) ds dr. 



Substituting with polar coordinates (r, s) = (ucos6, usinO) with u € [0,R] and 6 e [0, n/2], we 
have 



R. r^R 2 -r 2 
r=0 Js=0 



rs d ~ 2 f (^Jr 2 + s 2 ) dsdr 



rR rv/2 

/ / u cos 9u d - 2 sin d - 2 9f(u)u dO du 

Ju=0 J 6=0 
pR pir/1 

/ u d f{u) / cos6»sin d - 2 6» d6 du 
Ju=o Je=o 



f R 

= / u d f(u) 

Ju=0 



d-l 



sin 



t/2 l ,R 

du = / u d f(u) du 

j=o a — I J u=0 
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Combining the last two equations we obtain 



rA{r) dr = %_i / u f(u) du. 

r=0 Ju=0 

Note that the integral exists due to the monotonicity of / and the compactness of the interval 
[0,R]. □ 



Corollary 1 (Unweighted kNN-grap/iJ Let G n be the unweighted k-nearest neighbor graph and let 
f n be the unit weight function. Then 



1 / n 



cut*. 



km y- 1/d ^ ds 



nk n \ k n " (d+l)r)l +1/ - Js 
and, for a suitable constant C > 

1 / n „ ( 1 / n 

Pr I —j-thj- cut„ — E [—j-thj- cut r 



d k n /logn 



>e) <2exp(-Ce 2 n 1 - 2 / d fc I 2 /^ 



Proof. With Lemma [8] we have for any s £ S (1 C, plugging in the definition of r n (s), 



p« ( r ( s )) - H*=L ( k - 



1+1/ d 



Vd-1 



l + l/d 



Therefore, 
2 



As)F^ (r w (.)) ds = 2 / M (^) 1+1/ %-i-V- (a) ds 



Kn \ 2%_1 



n — 1 



/- 1/d (s) ds. 



Multiplying this term with the factor {k n /in — l))^ 1 ^ 1 / d we obtain a constant limit. We now 
multiply the inequality for the bias term in Proposition [T] with this factor and deal with the error 
terms. 

For the first on we derive an upper bound on F^\r^ ax ) similarly to above and obtain 

-x-i/d 



(-V) 4 1) (c ax )(/- = o 

\n — 1 J V n 



For the second error term we have with 8q = 3 and f n = 1 

-^r) 1 lld n- 5a f n (mi r n (x)) < n 2 r*" 3 = O (n" 1 ) 



For the last error term we have 



= 



d k n /logn 



Thus, considering that n 1 < y/k n /n, we obtain 



1 An-\ 



■ CUtr. 



2r?d-i 



= o 



d k n /logn 
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For the variance 

l 



Pr 



term we have with Proposition [T] and / n (0) = 1 
cut n — E 



n V nj ii 



1 Jn-l 



■ cut„ 



n V 



■ :: ) = Pr ( |cut„ — E (cut„)| > nk n a ^ ^ £ 



<2exp(-Ce 2 n 1 - 2 '''(;;''' 



Since 1/n = 0(y/k n /n) we can change \J(n— l)/k n in the scaling factor to y/n/k n without 
changing the convergence rate. 

□ 



Corollary 2 (Gaussian weights and l/a n (k n /n) 1 ^ d — > 0) LetG n be the k -nearest neighbor graph 
with Gaussian weight function and let l/a n (k n /n) 1 ^ d — > 0. Then 



= o 



1 Jk n \ djk n t /logn 
a n V n I 



and, for a suitable constant C > 
Pr 



1 / n ( 1 / n 

cut„ — E — — , d — cut r 



fik n y k n 



Tik n y k n 



> £ < 



2exp(-(7e 2 n 1 - 2 / d fc^ 



Proof. According to Lemma [9] we have for all s € S n C 



g «" F (<?) 
r^ +1 ( S ) 



^M«))- 



%-i 



(d+ l)(27r)? d / 2 



< 2 



n(«) 



Plugging in r n (s) = y/k n /((n - l)n d p(s)) we obtain 



qd 



n-1 



l + l/d 



{n dP {s)) 1+1/d F ( j\r n {s)) 



Vd-l 



(rf+l)(2vr)9 d / 2 



< 2 



1 



k n 



and therefore 
T </'' 



, \ l+l/d -1-1 Id 



A',, 



(d+l)(27r)9 d / 2 



<2( W («))- 1 - 1/d (^{/ ? t^t J <Ci(4r 

\ cr„ V (n - l)w(s) / V CT > 



2/d 



for a suitable constant C\ > 0. Therefore 



n — 1 



i+i/d 



1-1/d 



2/ 1 ^(.,^M.,,d.-^ mj y^ ( . )d . 



n — 1 



< 2 / p'(s) 
JSnc 



< 2 / p 2 (s)C 1 
JSnc 



1+1/ d 





snc 


fn- 




\ k n 




1 k n 




V ncr n, 





-1-1/ d 



2 ( S )F«(r„( S )) d S -2 / P 2 ^) ,!"!,^ - 1 - 1 ^) d S 



(27r) d / 2 (d + l) J 



-l-l/d 



F^( r (s)) - Vd ~ l71d v- 1 - 1/d (s'\ 



d.s 



ds = 2Ci 



2/<j 
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Now, we consider the error terms of Propositionjlj For the first one we have, using that F^(r 
0((C ax ) d+ V 'n) and > furthermore, r,f ax = 0( f/k n /(n - 1)) 



- _ i \ i+i/d nr~ 



ok 



71- 1 



l + l/d 



k \ 1+1 / d Ik 



n - 1 



O 



For the second error term we have with 5q = 4 



71—1 



l+l/d 



1 



n-«/„(inf,„(,)j<^n- (27r)d/M 



O (n 



For the third error term we have with / n (0) = O{o n ) and the monotonicity of /, 



l + l/d 



= 



logn 



For the variance term we have with Proposition [l] and f n (0) — (2tt) d ^ 2 cr n d for a suitable constant 
C" > 



c~ An — \ „/ erf* ,/n— 1 

cut„-E ' " d/ 



Pr 



^/2(0) 

where we have set C* = (27r) d C". 



cut„ 



> e = Pr |cut„ -E (cut n )| > 



<2exp (-C^n 1 " 2 /^^ , 



7ifc n d / k„ 



ai V 71 — 1 



Since 1/n = 0( y/k n /n) we can change ^/ (rt — l)/k n in the scaling factor to l/{nk n ) \Jnjk n with- 
out changing the convergence rate. □ 



Corollary 3 (Gaussian weights and a n (k n /n)~ 1 ^ d — > ) W^e consider the kNN graph with Gaus- 
sian weight function. Let cr n (k n /n)~ 1 ' d —> and ncr^ +1 — > oo /or n — > oo. TTien i/iere exists a 
constant C > suc/i £/iai 



E 



1 



cut n - 



2tt 



p 2 (s) ds 



= O 



1 



exp — C 



1 d / 
a„ V 71 



Furthermore, suppose \Jk n jn > c™ /or an a € (0, 1) and rt sufficiently large. Then there exist 
non-negative random variables Dn ,Dn such that 



cut, 



E 



cut, 



with Pr(L>i 1) > e) < 2exp(C 2 no-, d + 1 e 2 ) for a constant C 2 > 0, and Pr(D%> > a n ) < 1/ 



(2) 



Proof. With Lemma 10 we have for for \J k n /n/a n sufficiently large 



p 2 ( S )F^(r n (s))ds 



n Jsnc 



2tt Js 



p 2 (s) ds 



< 2 



p\s) 



snc 



1 



O exp 



1 / 1 d I k n 

4(p max '7d) 2/d \ o- n V rt 



ds 

2\ 



where we use that p and Cd-i{S n C) are bounded. 
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Now we bound the error terms from Proposition [T] of the other difference 



E 



n(n — 1)<7„ 



cut. 



n Jsnc 



For the first one we observe that with Lemma 
For the second one we have with Lemma [Till 



-(4V)-F«(iiif r n (x))) 
a n \ xec ) 



10 



we have Fq (r™ ax ) = 0(er„) and therefore 




For the third error term we observe that if n is sufficiently large such that S n < 1/2 and < 1/4 
then for all a; € C, 



r n 0) 

Then we have with Lemma [lOl 



'(l-2£ n )(l-<5 n )fe„ > J K 



(n - l)p(x)r] d 



^Pm^Vdn 



— (f$\oo) - F^(M r-(x))) = O I exp 

(J n \ xGC J I 



i f _L V- 




Now we proof the bound for the variance term. Unfortunately, the bound in Proposition [T] based 
on McDiarmid's inequality does not give good results. Therefore we proof a bound on the variance 
term directly. We set cut„ to be the cut n in the complete graph with Gaussian weights on the 
sample and we set cut™ lss to be sum of the weights of the edges that are in the cut but not in the 
kNN graph. Then cut,, = cut n — cut™ lss and we have 

cut„ „, / cut„ 



n(n - l)cr„ 



E 



n(n - 1)<t„ 



cut„ 



n{n — l)(j, 



— E 



cut„ 



n(n — l)ay, 



cut' 1 



< 



CUtr, 



n(n - l)a n 



E 



cut. 



n(n - 1)<7„ 



n(n - l)a n 

cut™ iss 
n(n - l)cr„ 



-E 



■E 



cut" 



n(n — l)cr, 

cut™ iss 
n(n- l)cr„ 



The first deviation term is dealt with in Corollary [8] 

We denote with T> the event that the /c-nearest neighbor radius of all the points is greater than 
r™ ln = \Jk n j (2p max ?]d(n — 1)). One can show similarly to the proof of Lemma [5] that Pr(2? c ) < 
exp(logn — k n /8) and thus Pr(2? c ) < 1/n 3 for sufficiently large n, since fc„/logn — > oo. If V holds, 
all the edges in cut™ ISS must have weight lower than /n(V™ ln ), whereas if V c holds the maximum 
edge weight is /„(0). There are n{n — 1) possible edges and thus 



E 



cut" 



n(n — l)u r . 



< , * n(n - l)/„(0) Pr(P c ) + 1 n(n - l)/„(C in ) Pr(2>) 
nyn — l)a n nyn — l)a n 



O 



1 1 



1 



d+l 



exp 



(C in ) 2 

2al 



O 



1 



1 



n 2 rr d+1 
11 On 



exp 



2al 



since nat +1 — > oo for n — > oo. 



Under the condition \J k n /n > a" with a 6 (0, 1) we have for sufficiently large n and a suitable 
constant C\ 



exp 



(C in )^ 

2a2 



< -l T exp(-C' 1( 7 r 2 / Q - 1 )) <a n , 
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where we use that the exponential term converges to zero faster than any power of a n . 
For the other term we clearly have for n sufficiently large 

* (A > -) < * (^k > & - ( j m < < h- 

Clearly, we can replace n(n—l) in the scaling factor by n 2 without changing the convergence rate. □ 



6.2.3 The volume term of the kNN graph 

Proposition 4 Let G n be the k-nearest neighbor graph with a monotonically decreasing weight 
function /„ and let H — H + or H = H~ . Then 



E 



O [ { -F^ (C ax ) J +0 fmm{/« ( M r n ( x )) n - s °,F^(oc) - F^(Mr n (x)) 



where we set S n = \J (4<5o log n)/k n for a do > 2 in the definition of r~ (x) . 
For the variance term we have for a suitable constant C > 



Pr (|vol„(tf) - E (vol n (H))\ >e)< 2exp (^C ^ 

Proof. Similarly to the proof of for the cut we define for i,j G {1, . . . ,n}, i ^ j the random 
variable Wy as 



Wij 



f n (dist(xi,Xj) if a;, s H and (xi,Xj) edge in G„ 
otherwise 



and then have E (vol n (H)) = n(n— l)E(Wi2)- With a function c(x, y) that indicates the probability 
of connectedness we obtain 



nW? 2 )= f f fi(dmt(x,y))c(x,y)p(y) dy p(x) d, . 
JHnC JC 



Setting TZ n = {ye HOC \ dist(y, d(H D C)) < 2r?/ ax } and l n = (Hf)C)\ TZ n we can decompose 
the outer integral into integrals over 7Z n and X n . 

First suppose x £ lZ n and let c n denote a bound on the probability that points in distance at least 
r max are conn ected. Then, using c„ < 2exp(— k n /8) and Lemma [5j 

/ /'(distfo y))c(x, y)p(y) dy < Pmax f f«(dist(x, y)) dy + /« (C ax ) c„ / p(y) dy 

Jc JB(x,r™*)nC JC 

< jw*W f " u d - l fl{ U ) du + 2fl (C ax ) cxp (-kn/8) 



= jw4 a) (C ax ) + 2/« (C ax ) exp (-k n /8) . 

As was explained in the proof for the cut we can replace the term 2/^ (r™ ax ) exp (— k n /8) by the 
term 



Pn 



(F^(oo)-F^(C ax )), 
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which is better suited, for example for the Gaussian. 



Therefore, using that according to Lemma 11 the volume of TZ n is in 0( 



f q (dist(x,y))c(x,y)p(y) dy dx = O ( \^F B q) (r™ ax ) 



K n JC 



O 



d k n ( Iq) 



For x £ X n we introduce as in the proof for the cut radii r~(x) < r™ ax and r+(x) < r™ ax that 
depend on S n and £ n defined there. These radii approximate the true kNN radius. For a lower 
bound we obtain 

fZ(dist(x,y))c(x,y)p(y) dy >F {q) (r n (x))p(x) - p max (f® (r n (a:)) - (r-(ar))) 

- (£„ + 6exp (-6 2 n k n /3)) p max F B q) (C ax ) ■ 
For some weight functions, especially the Gaussian, we can use 

F B q) (r n (x)) - Fj*> (r-(x)) < F B q) (oo) - F B q) (inf r -(x) 

whereas for other ones it is better to use 

F B q) (r n (x)) - F B q) (r-(x)) = d Vd f u^ 1 f q (u) du 

J r n [x) 



< Vd f* inf r-(x) )(t n + 6 n )(r*r) a . 

\x£C J 

Similarly we obtain an upper bound, with an additional term (iitf xG c r n (x)) exp (— 5 n k n /4^ or 

Pmax(Fg (oo) — Fg \mi xe c r n(x))) bounding the influence of points that are further away than 
r+(x). Combining the bounds we obtain 



fZ(dist(x,y))c(x,y)p(y) dy - / F { B q) (r n (x))p 2 (x) dx 

IX n JC JI n 



= O + exp (-6 2 n k n /3))Fg> (C ax )^ 

+ O (min {/« (jnf r"(x)) (£ n + 5„) (C ax ) d , (oo) - F B q) (jnf r" (*)) }) 
+ O Lin {/« (inf r„(ao) exp (-^fc„/4) , fjftoo) - F^inf r n (a:))}) . 

Setting S n = ^/ (4(S log n) / k n we obtain exp (— S 2 k n /3) < n~ s ° and the same for exp (— <5^fc„/4). 
Clearly, for <5 > 2 we have n" 5 " < £„ and n" 5 " < (£„C ax ) d • Thus , wit h = 0(C ax ) = 

f q (dist(x,y))c(x,y)p(y) dy - f F B q) (r n (x))p 2 (x) dx 



In JC 



old^F® (C ax ) 



+ 0[min{f«[ inf r~(x) 



'k n \0gn \ k n (g) (q) . 

; ~~ ) F B M ~ F^ y ( inf r n (x, 



O Lin {/* f inf r„(ao) n- 5 «,F^(oo) - F^inf r n (*))}) 
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Finally, by finding an upper bound on the integrand and the volume of (H n C) \ I„ we obtain 

/ (r n (x))p(x) dx- [ F ( « ] (r n (x)) p 2 (x) dx 

Jx n June 

Combining all the bounds above we obtain the result for the bias term. The bound for the variance 
term can be obtained with McDiarmid's inequality similarly to the proof for the cut in Proposi- 
tion Q] □ 




The following lemma is necessary for the proof of the general theorem for both, the r-graph and 
the kNN-graph. It is an elementary lemma and therefore stated without proof. 

Lemma 5 (Integration over balls) Let f n : M>o — > R>o be a monotonically decreasing function 
and x £ M. d . Then we have for any R £ M>o 



/(dist(x,y)) dy = di] d / u d f(u) du 



B(x,R) 



Corollary 4 (Unweighted kNN-graph) Let G n be the unweighted kNN graph with weight func- 
tion f n = 1 and let H = H + or H = H~ . Then we have for the bias term 



VOln(g) 

nkn 



p(x) dx 



H 



= O 



i k n /logn 



and for the variance term for a suitable constant C 



Pr 



> e < 2exp ( -Cine 



Proof. With Lemma [8] we have, plugging in the definition of r n (x) : 



F { B\r n (x))p 2 (x) dx- 



Hnc 



k n 



-p 2 (x) dx — 



p{x) dx. 



H 



I Hnc ( n ~ l )mp{x) w n-1 
Therefore by multiplying the expression in Proposition [4] with [n — l)/k n we obtain for any Sq > 2 
vol n (H) 



nk n 



p(x) dx 



H 



-°(\rv^ )(rrx) ) 

() | ^r—^fn ( inf r~(x) ) n 



&0 



+ 



n-l kn I d jk n t /logn 



f n [ Jnf r„ (x) 



Using Fg \r™ ax ) ~ (n — l)/fc„ and /„ = 1 we obtain 



vo\ n (H) 



nk n 



p{x) dx 



H 



= o 



d Jkn , /logn 
V n 



For the variance term we use the bound in Proposition [4] and plug in /„(0) = 1. □ 
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Corollary 5 (Gaussian weights and (k n /n) 1 / d /a n — > 0) Consider the kNN graph with Gaus- 
sian weights and (kn/n) 1 ^ / 'a n — > 0. Let H = H + or H = H~ . Then we have for the bias 
term 

2 



o 



1 d I k n 

a„ V n 



d kn , /logn 

V n 



and for the variance term, for a suitable constant C > 0, 



Pr 



a n 

nk, 



cr" 



vol n (H)-E[ -^vo\ n {H) 



> e < 2exp -Cn 



Proof. According to Lemma [9] we have for all x € C 



F {q} (r Ml 

*(x) B 1 A )] (2tt)^/ 



< 3 



r n (x) 



Plugging in r n {x) = \/k n /((n — i)r)d,p(x)) and dividing by rjdp{x) we obtain for points in the 
support of p 

2/d\ 

o": \ — I > ,..' l/-„(-'-|) : r--. =0 



kn 



(2-rr)i d / 2 p(x) 



kn 



Therefore, using the boundedness of p 



n-1 



n J JHnC 



p 2 (x)F i B 1 \r n (x)) Ax 



(2^/2 J H 



p(x) da; 



O 



2/d> 



Now, we consider the error terms from Proposition [4] of the other difference 



vol„(iJ)-< 



p 2 (x)F^(r n (x)) dx 



fine 



As we have seen above a d (n — l)/k n Fg (rjf ax ) can be bounded by a constant. Thus we have for 
the first term 

d ( n-l\ afkn'(i) 



kn 



F K b> (C ax ) = O 



For the second term we have for n sufficiently large and setting 5q = 3 



n — 1 

k n 



f n ( inf r-{x) ) n" 5 ° < a% 



n — 1 



n — 1 

k n 



n- s " < n~ 2 . 



For the third term we have 



n — 1 \ 



m / d / "'n 
/J 



log n 

k n 



fn ( inf r„ (a:) ) < 

L x t O 





/log n 


n 


V fcn 


k n 


/logn 


n 


V fc„ 



^n/n (0) 



(27r) d / 2 ' 



For i/ie variance term we have for a suitable constant C > 

> e ) = Pr (|vol n (tf ) - E (vol n (i?))| > nk n a~ d e) 



Pr 



-^-vol„(iJ)-E ^vol n (ff) 



< 2exp -C 



lip. -Id 2 
~, II ^n a n £ 



where we have set C = {2it) d C' . □ 



< 2exp -C — 



— ?d 2 



-2d 



2 exp 



(2ir) d 
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Corollary 6 (Gaussian weights and (k n /n) 1 / d /a n — > oo) LetG n be the kNN graph with Gaus- 
sian weights. Then for the bias term for a constant G\ > 



E 



/ V0ln(ff) 



p 2 (x) dx 



H 



O | a/ h exp -Ci 

n \ \ a n v n 



Lei, furthermore, y/k n /n > cr" for an a e (0, 1) and n sufficiently large. Then there exist non- 
negative random variables Dn ,D„ such that 



voUff) £ fvo\ n (H) 



0{a n )+DW+DW, 



\ n 2 

with Pr(£>i 1) > e) < 2exp(C 2 na d+1 e 2 ) for a constant C 2 > 0, and Pr(D^ > cr n ) <l/n 3 . 



Proof. With Lemma 10 we have for n sufficiently large such that r n {x)ja n sufficiently large 
uniformly over all x G C 



Hnc 



/ p 2 (x) dx 


<-j 


I H 


JHnC 



Fg\r n {x))-1 P 2 (x) dx 



O exp - 



1 (K 



i(PmnxVd) 2/d 0~l V n 



Now we bound the error terms from Proposition [4] of the other difference 

1 



E 



n(n — 1) 



Hnc 



p 2 (x)F^>(r n (x)) dx 



For the first error term we use that according to Lemma 10 Fg (r^**) is bounded by one for n 
sufficiently large. Therefore tfkJnF^ \r% a *) = 0{f/kjn). 

For the second and third error term we observe that if n is sufficiently large such that S n < 1/2 
and < 1/4 then 



inf rJx) > inf r"(z) = inf W ~ , ^ > \ ~ . 

xec xec xec y (n - l)p(x)r]d y 4p max r/ d n 

and therefore, for both, the second and the third error term, 



F«(oo) - F^(M r n (x)) = O | exp | 



1 / _1_ Jkn 

4(4p max ?7 d ) 2 / d { o n V n 



The proof of the bound for the variance term is identical to the corresponding part in the proof of 
Corollary [3] Therefore, we do not repeat it here. 

Clearly, we can replace n(n— 1) in the scaling factor by n 2 without changing the convergence rate. □ 



6.2.4 The main theorem for the kNN graph 

Proof, of Theorem [T] As discussed in Section |6.1| we can study the convergence of the bias and 
variance terms of the cut and the volume separately. 

For the unweighted graph we have with Corollary[T]that under the condition k n / log n — > oo the bias 
term for the cut is in 0( y/k n /n + •v/log n/k n ). For some s > the probability that the variance 



3G 



term exceeds e is bounded by 2 exp(— Ge 2 n x ~ 2 l d kn d ) for a suitable constant C. Clearly, the bias 
term converges to zero under the condition k n / log n — > oo. For the almost sure convergence of the 
variance term we need the stricter condition in dimension d = 1. The convergence of the volume- 
term follows with Corollary |4j since the requirements for this convergence are weaker. In the case 
d > 2 we obtain the optimal rates by equating the two bounds of the bias term and checking that 
the variance term converges as well at this rate. In the case d = 1 the optimal rate is determined 
by the variance term. 

For the kNN-grop/i with Gaussian weights and r n /a n — > oo we need the stronger condition r„ > cr" 
for an a € (0, 1) in order to show convergence of both, the bias term and the variance term. Under 
this condition we have according to Corollaries [3] and [6] that the bias term of both, the cut and the 
volume, is in 0(r n ), since the exponential term converges as a n . 

Furthermore, the almost sure convergence of the variance term can be shown with the Borel-Cantelli 
lemma if na d+1 / logn — > oo for n oo. 

For the kNN -graph with Gaussian weights and r n /a n — > according to Corollary [2] the bias term 
of the cut is in 0(r n + (r n /a n ) 2 + ^/logn/fc„). The probability that the variance term of the cut 
exceeds an e > is bounded by 2 exp(— Cn 1 ~ 2 l d k 2 J d ) for a suitable constant C, which is the same 
expression as in the unweighted case. Therefore, we have almost sure convergence of the cut-term 
to zero under the same conditions as for the unweighted kNN graph. 

From Corollary [5] we can see that the convergence conditions for the volume are less strict than 
that of the cut. □ 



6.3 The r-graph and the complete weighted graph 

This section consists of three parts: In the first one the convergence of the bias and variance term of 
the cut is studied, whereas in the second part that convergence is studied for the volume. Combining 
these results we can proof the main theorems on the convergence of NCut and CheegerCut for the 
r-graph and the complete weighted graph. 



Section 6.3.1 and Section 6.3.2 are built up similarly: First, a proposition for a general weight 
function is given. The results are stated in terms of the "cap" and "ball" integrals and some 
properties of the weight function. Then four corollaries follow, where the general result is applied 
to the complete weighted graph with Gaussian weight function and to the r-graph with the specific 
weight functions we consider in this paper. 

Some words on the proofs: The results on the bias terms for general weight functions can be shown 
analogously to the corresponding results for the kNN graph. Since the connectivity in these graphs 
given the position of two points is not random they are even simpler. Furthermore, all the error 
terms in the result for the kNN graph that are due to the uncertainty in the connectivity radius 
can be dropped for the r-graph and the complete weighted graph. Therefore, in the proof of the 
bias term of the cut we only discuss the adaptations that are made to the proof of the kNN graph. 

As explained in Section [6. 1| the situation is different for the variance term, where the convergence 
proof for the kNN-graph would lead to suboptimal results when carried over to the other two 
graphs. For this reason we give a different proof for the convergence of the variance term in the 
proof of the general result for the cut. It can be easily carried over to the volume and thus we 
omit it there. 

As to the corollaries we only proof two of them: that for the complete weighted graph and that 
for the r-graph with Gaussian weights and r n /a n for n — > oo. The proof of the corollary for 
the unweighted graph is very simple, that of the corollary for the r-graph with Gaussian weights 
and a n /r n — ¥ is identical to the proof for the complete weighted graph where we can ignore one 
term. 



The proofs in Section 6.3.2 are completely omitted: The general result on the bias term can be 



proved analogously to that for the kNN graph, if the adaptations that are discussed in the proof 
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for the bias term of the cut are made. The general result on the variance term of the volume is 
proved analogously to that on the variance term of the cut. The proofs of the corollaries also work 
analogously to the corresponding proofs for the cut. 



The proofs of the main theorems in Section 6.3.3 collect the bounds of the corollaries and identify 
the conditions that have to hold for the convergence of NCut and CheegerCut. 



6.3.1 The cut term in the r-graph and the complete weighted graph 



Proposition 6 (The cut in the r-neighborhood and the complete weighted graph) Let (r„) n6 N 
be a sequence that fulfills the conditions on parameter sequences of the r-neighborhood graph. Let 
G n denote the r-neighborhood graph with parameter r n or the complete weighted graph on X\, . . . ,x n 
with a monotonically decreasing weight function f n : R>o — > K>o. We set 



1, = 



1 if Gn is the complete weighted graph 
if G n is the r n -neighborhood graph. 



Then for the bias term 

„ / CUt r 

E 



n(n-l)F^(r n ) 



p 2 (s) ds 



O r* 



F { c\r n ) 



Furthermore, there are constants C\ , C2 such that for the variance term 



Pr 



cut n 



n(n-i)Fg\r n ) 



— E 



cut„ 



> e 



< 2exp 



C 1 F^(r n ) + C 2 (F^>(oo) - F£>(r n ))l c + 2eF^>(r n ) /„(0) 



r (2) 



(2), 



V 



Proof. As was said in the introduction we do not give the detailed proof of this proposition here, 
since it is similar to the proof of the corresponding proposition for the kNN-graph but simpler: 
the radius r n is the same everywhere, that is we can set r™ ax = r+(s) + = r~(s) = r n for all 
s G S. Furthermore, the connectivity is not random, that is we can set a n = b n = c„ = for the 
r-neighborhood graph, whereas we set a n = 0,6„ = 1 and c„ = 1 for the complete weighted graph. 
We obtain 



E (W? 2 ) - 2F^(r n ) f p 2 (s) ds =0 (f^ (r n )r n + (fjftoo) - F { B q \r n [ 
J s 



1, 



and thus the result for the bias term immediately. 

In order to bound the variance term we use a [/-statistics argument. We have 

cut,, 

(i) u \ ~ n(n-l) 2^1^\ 



— = 1 w - 

(71 - (r n ) n(n - 1) ^ ^ F W (r „) 



-Wi.. 



J- 



For the upper bound on the properly rescaled variable Wij clearly 

1 „, 1 



W l3 < 



F { c ] (r n ) 



fn(0) 



and for the variance 
1 



Var 



F^ (r n ) 



-Wi, 



E 



F { c ] (r n ) 




E 



F$ ] (^) 



Wi, 



< 



Fg ] (r n ), 



ElWf, 
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With a Bernstein-type concentration inequality for {/-statistics from Hoeffding ( 1963 1 we obtain 
cut„ 



Pr 



E 



cut r 



n(n - (r n ) \n{n - (r„) y 

/ 

[n/2\e 2 



> e 



< 2exp 



V 



< 2exp 



V 



Tie- 



where we have used |_?V2J > re/3 for n > 2 

Clearly, for r„ -> we can find constants (depending on p and S) C± and C2 such that for n 
sufficiently large 6E(Wg) < C*iF£ 2) (r«) + C 2 (^ 2) (cx)) - i^ 2) (r„))l c . □ 



The following corollary can be proved by plugging in the results of Lemma [8] into the bounds of 
Proposition [6] We do not give the details here. 

Corollary 7 (Unweighted r-graph) For the r -neighborhood graph and the weight function f n = 
1 we obtain 



E 



CUtn 
It In 



f I p 2 ( s ) ds 

J S 



= 0(r n ). 



and, for a suitable constant C > 0, 



n 2 rt +1 



cut„ 
n 2 rt +l 



> e) < 2 exp 



(-Cnr^e 2 ) . 



Corollary 8 (Complete weighted graph) Consider the complete weighted graph G n with Gaus- 
sian weight function. Then we have for the bias term for any a £ (0, 1) 



E 



cut r 

n 2 rr 



2 

71^ 



p 2 {s) ds 



OK). 



For the variance term we can find a constant C > such that for n sufficiently large 



Pr 



cut T 



-E 



cut. 



> e I < 2 exp [~Cna d n +l e 2 



Proof. Let r n be a sequence with r„ — > and r n /a n — > 00 for n — > 00. We use the bound fro m 
Proposition [(3] and the fact that Fjj\r n )/a n can be bounded by a constant due to Lemma 
obtain 



10 



to 



E 



cut K 
n(n — l)ay, 



p (s) ds 



O r n 



F^(^)_-F { ^(r n ) 

Or. 



O ( r n -\ exp 

O n 



On the other hand, using Lemma 10 the boundedness of p and Cd~i{S fl C), we have for r n /a n 
sufficiently large 



F ( c\r n ) 



p 2 {s) ds 



'2tt 



f P 2 (s) ds 


< 


Js 





F ( c\r n ) 



1 



'2tt 



2 / p\s) ds = 0[ exp 



r„ 
io 2 
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Combining these two bounds und using log o~ n < for n sufficiently large we obtain 



E 



cut„ 



i(n - l)a Tl 



p 2 (s) ds 



= 0(r„ + exp(~^j). 



r 2 \ 



Setting r n = er" we have to show that the exponential term converges as fast. We have 
,2 \ / 1 \ / 1 



ct„ exp 



4cr 2 



(a 2 n a - 2 )'-' exp^-a: 







for n — ¥ oo, since a; r exp(— x) for x oo and all rel. 
For the variance term we have with Proposition [6] and for constants C\ , C% 
cut„ 



Pr 



— E 



cut, 



n(n — l)<r n \n(n — l)er 
cut„ 



> £ 



Pr 



< 2 exp 



(n-l)4 X) (r n ) \n{n-l)F^{r n ) 



E 



CUtr 



F ( c\r n ) 



With Lemma 



10 



(7r4 2) (r„) + C* 2 (4 2) M - ^ 2) (r„)) + 2eF^(r n ) f n (Q) J ' 
we have for r n /a n sufficiently large F^?\r n ) = 0{cr}~ d ), and 
Ff(oo) - F^(r n ) = O (a^exp - O K~ d ) , 

if we choose r„ = <r" for a £ (0, 1) similarly to above. 

For the last term in the denominator we have F£'(r n ) /„(0) = O (a n a~ <1 ) = O (cr* _d ). Therefore, 
we can find a constant C3 > such that 

cut„ „ / cut 



Pr 



E 



: j 2<.x.p (-Ca^J) = 2exp (-OrfV) 



n(n — l)cr„ l)cr„ 
Since we assume that na n — > 00 for n — > 00 we can replace n(n — 1) in the scaling factor by n 2 . □ 



We do not state the proof of the following corollary, since it is similar to the proof of the last one. 
The difference is, that we do not have to consider the l c -terms, which are zero in the case of the 
r-graph. 

Corollary 9 (r-graph with Gaussian weights and a n /r n — > 0) Let G n be the r-graph with 
Gaussian weight junction and let a n /r n — > for n — > 00. Then we have for the bias term 



E 



cut„ 



p 2 {s) ds 



= O 



V2TTJS 

For the variance term we can find a constant Ci > such that 



exp 



Pr 



Corollary 10 (r-graph with Gaussian weights and r n /a n — > 0) Consider the r -neighborhood 
graph with Gaussian weight function and let r n /o~„ — > for n — > 00. Then we can find a constant 
C > such that 



E' CT " CUt ™ 



ri +1 n 2 J (rf+l)(27r) d / 2 



p 2 (s) ds 



= O 
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and 



Fi- 



eri cut r 



„d+l 



E 



< cut r , 
ri +1 n 2 



> ej < 2exp [~Cne 2 ri +1 ^j 



Proof. Multiplying the bound in Proposition [fj] with aiF^}\r n )/rf+\ which can be bounded by 
a constant according to Lemma [9j and using l c = we obtain 



<F^(r n ) cut n 
ri +1 n{n - 1) 



E 



„d+l 



p 2 (s) ds 



0(r n 



On the other hand, by the boundedness of p and Cd-i(S fl C), and with Lemma[9] 



p 2 (s) ds 



d-l 



(d+l)(27T) d / 2 



p 2 (s) ds 



= O 



Combining these two bounds we obtain the result for the bias term. 
For the variance term we have with Proposition [6] and for a constant C\ 



Pr 



cut. 



ri +1 n(n - 1) 



— E 



= Pr 



CUtr, 



n(n-l)Fg\r n ) 



r d n +1 n(n - 1) 
-E| — 



> £ 
cut, 



n d+l 



> 



<F ( c\r n ) 



< 2exp 



CiF {2 \r n ) +2eF^\r n ) /„(0), 



With Lemmajojwe obtain F^\r n ) = 0(r^ +1 /er 2 J d ) for sufficiently large n. With the same propo- 
sition and plugging in /„(0) we obtain F^\r n )f n (0) = 0(r d+1 /a 2d ). Plugging in these results 
above we obtain the bound for the variance term. 

Since we always assume that nr n —¥ oo for ?i-4oowe can replace n(n— 1) in the scaling factor by 
n 2 . □ 



6.3.2 The volume term in the r-graph and the complete weighted graph 

The following results are stated without proof: Proposition [7] can be proved analogously to Propo- 
sition [4] if the remarks on the difference between the kNN-graph and r- neighborhood graph in the 
proof of Proposition [6] are considered. The corollaries can be shown similarly to the corresponding 
corollaries in the previous section. 



Proposition 7 Let G n be the r n -neighborhood graph or the complete weighted graph with a weight 
function f n and set l c as in Proposition^ Then 



E 



vol„(F) 



n{n-l)F f i\r n ) t 



p 2 (x) dx 



H 



For the variance term we have 
vol n (H) 



Pr 



n(7i-l)^ 1) (r„) 



E 



vol n (H) 



n{n-l)F^\r n ) i 



< O 



> e 



F™(oo)-Fg>(r n ) . 
Fg\r n ) 



< 2exp 



ne 2 (F«(r„))' 



C!4 2) (r„) + C 2 l c (F {2] (oo) - F {2) (r n )) + 2ef n (0)F$\r n ) 
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Corollary 11 (Unweighted graph) For /„ = 1 and the r n -neighborhood graph we have 

< 0(r n ) 



and, for a constant C > 
Pr 



I V °V " } \ -Vd I P z (x)dx 
Hnc 



VDln(g) g ^VOln(g) 



> e I < 2cxp (-C*ne 2 r^ 



Corollary 12 (Complete weighted graph with Gaussian weights) Consider the complete weighted 
graph with the Gaussian weight function and a parameter sequence a„ — > 0. Then we have for any 
a e (0, 1) 



E 



fvo\ n {H) 



\ n~ J JH 
Furthermore there is a constant C > such that 



p 2 (x) dx 



= O «) . 



Pr 



> e I < exp ( -C"ne 2 cr d 



Corollary 13 (r-graph with Gaussian weights and a n jr n — > 0) LetG n be the r -neighborhood 
graph with Gaussian weights and let cr n /r n for n — > oo. T/ien w;e have for the bias term for 
sufficiently large n 



E 



vol n (ff) 



p 2 {x) dx 



H 



O [r n + cxp 



1 r 2 

4 vi- 



and for the variance term for a suitable constant C' > 



Pr 



V n 2 



> e J < exp 



(-C'ne 2 ^) 



Corollary 14 (r-graph with Gaussian weights and r n /a n — > 0) LetG n be the r -neighborhood 
graph with Gaussian weights and let r n /a n — > for n — > oo. Then we have for the bias term for 
sufficiently large n 



E 



vo\ n (H] 



Vd 



(2^/2 J H 



p 2 (x) dx 



O r n 



and for the variance term for a suitable constant C > 



Pr 



£rvol„(ff)-E -^voWff) 



> e ) < 2 cxp [-Cne 2 r%\ . 



6.3.3 The main theorems for the 7--graph and the complete weighted graph 

Proof, of Theorem [2] As discussed in Section |6.1| we can study the convergence of the bias and 
variance terms of the cut and the volume separately. 

For the unweighted r-graph we have with Corollary [7] that the bias term of the cut is in 0(r n ) and 
that for e > we can find a constant C such that the probability that the variance term of the cut 
exceeds e is bounded by 2 exp(— CwtJJ +1 £ 2 ). Thus the cut-term converges almost surely to zero 
for r n — ► and nr^ +1 / log n — >• oo. It follows from Corollary 



11 



that under these conditions the 
vol-term also converges to zero. The best convergence rate for the cut-term is d+ ^/logn/n, which 
is achieved setting r n ~ d+ ^/log n/n. Setting r„ in this way the convergence rate of the vol-term 
is also d+ ^/log n/n. 
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For the r-graph with Gaussian weights and r n /a n — > oo we have with Corollaries [9] and 13 that 
the bias term of both, the cut and the volume, is in 0(r n + exp(— l/4(r n /cr n ) 2 )). Furthermore, we 
can find a constant C > such that the probability that the variance term of the cut exceeds an 
e > is bounded by 2 exp(— Cno~ d+1 s 2 ). Similarly, the variance term of the volume would converge 
almost surely for na d J log n — > oo. This implies almost sure convergence of A„ to zero under the 
condition na d+1 / \ogn — > oo for n — > oo. 



For the r-graph with Gaussian weights and r n /a n — > we have with Corollary 10 a rate of 
0(r n + (r n /a n ) 2 ) for the bias term of the cut. Furthermore, the probability that the variance term 
exceeds an e > is bounded by 2 exp(— Cne 2 r d+1 ) with a constant C. Therefore, the cut-term 
almost surely converges to zero under the conditions r n — > and nr d+ 1 / log n —> oo. Under these 
conditions with Corollary |14| the volume-term also converges to zero. □ 



Proof, of Theorem [3] As discussed in Section |6.1| we can study the convergence of the bias and 
variance terms of the cut and the volume separately. 

With Corollaries [8] and 12 we have that the bias term of both, the cut and the volume is in 0(<r") 
for any a £ (0,1). Furthermore, the probability that the variance term of the cut exceeds an 
e > is bounded by 2exp(— C n a d+1 e 2 ) with a suitable constant C. For the variance term of the 
volume the exponent in this bound is only d. Consequently, we have almost sure convergence to 
zero under the condition na^ +1 / log n — > oo. 

For any fixed a € (0, 1) the optimal convergence rate is achieved setting a n = ((\ogn)/n) 1 ^ d+1+2a \ 
Since the variance term has to converge for any a € (0, 1) we choose a n = ((logn)/n) 1 ^ £i+3 - ) and 
achieve a convergence rate of er" for any a € (0, 1). □ 



6.4 The integrals F^\r) and the size of the boundary strips 
Lemma 8 (Unit weights) Let f n = 1 be the unit weight function. Then for any r > 

c v i c \ ) d+l 



F < £ ) {r) = F%\r) = r ld r d . 

Lemma 9 (Gaussian weights and r n /o~ n ~ > 0) Let f n denote the Gaussian weight function with 
parameter a n and let r„ > 0. Then we have for q = 1, 2 for the cap integral 



Tjd-l 



For the ball integral Fg\r n ) we have 



(d+l)(27r)9 d / 2 



< 2 



„d B 



(2TT)1 d / 2 



< 3 



Lemma 10 (Gaussian weights and a n /r n — > 0) Let f n denote the Gaussian weight function 
with a parameter a n and let r n /a n > Ad. Then we have Fq (oo) = o~ n /\/2ir and 



^\r n ) 



'2tt 



= O exp - 



1 

4 \a. 



Furthermore, F [ ^\oo) = 0{a}- d ) and F (2] ' (oo) - F (2] \r n ) = 0(^- rf exp (-(r, i /cr„) 2 /4)) 



r(2 



(2), 
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For the ball integral we have under the same conditions Fg ' (oo) = 1 

|4V)-.|-o(-p(-ife)*)). 

Furthermore, F^ ] (oo) = O^) and {oo) - F<g ] (r n ) = 0(cr- d cxp (-(r„/<7 n ) 2 /4)). 

The following lemma is necessary to bound the influence of points close to the boundary on the 
cut and the volume. The first statement is used for the cut, whereas the second statement is used 
for the volume. 

Lemma 11 Let the general assumptions hold and let (r„) nG N be a sequence with r n — > for 
n^oo. Define K n = {x <= R d | dist(z, dC) < 2r n }. Then C d -i(S H H n ) = 0(r n ). 

For H = H+ or H = H~ define H n = {x e H n C | dist(x, d(H n C)) < 2r n }. Then C d {K n ) = 
0(r n ). 
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